Sciweavers

290 search results - page 54 / 58
» Document normalization revisited
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
SIGIR
2009
ACM
14 years 1 months ago
Addressing morphological variation in alphabetic languages
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normaliz...
Paul McNamee, Charles K. Nicholas, James Mayfield
ICDM
2005
IEEE
122views Data Mining» more  ICDM 2005»
14 years 1 months ago
ViVo: Visual Vocabulary Construction for Mining Biomedical Images
Given a large collection of medical images of several conditions and treatments, how can we succinctly describe the characteristics of each setting? For example, given a large col...
Arnab Bhattacharya, Vebjorn Ljosa, Jia-Yu Pan, Mar...
CCS
2001
ACM
13 years 12 months ago
Taking the Copy Out of Copyright
Under current U.S. law and common understanding, the fundamental right granted by copyright is the right of reproduction – of making copies. Indeed, the very word “copyright”...
Ernest Miller, Joan Feigenbaum
KDD
2010
ACM
250views Data Mining» more  KDD 2010»
13 years 9 months ago
On community outliers and their efficient detection in information networks
Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected...
Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Su...