Sciweavers

1286 search results - page 149 / 258
» A General Framework for Searching in Distributed Data Reposi...
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 8 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
SIGMOD
2009
ACM
190views Database» more  SIGMOD 2009»
14 years 8 months ago
Optimizing complex extraction programs over evolving text data
Most information extraction (IE) approaches have considered only static text corpora, over which we apply IE only once. Many real-world text corpora however are dynamic. They evol...
Fei Chen 0002, Byron J. Gao, AnHai Doan, Jun Yang ...
MM
2009
ACM
221views Multimedia» more  MM 2009»
14 years 2 months ago
Using large-scale web data to facilitate textual query based retrieval of consumer photos
The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of consumer photo collections. In this paper, we present a (quasi) real-time t...
Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo
SIGMOD
2011
ACM
269views Database» more  SIGMOD 2011»
12 years 11 months ago
Advancing data clustering via projective clustering ensembles
Projective Clustering Ensembles (PCE) are a very recent advance in data clustering research which combines the two powerful tools of clustering ensembles and projective clustering...
Francesco Gullo, Carlotta Domeniconi, Andrea Tagar...
KDD
2007
ACM
169views Data Mining» more  KDD 2007»
14 years 8 months ago
Exploiting underrepresented query aspects for automatic query expansion
Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely ...
Daniel Crabtree, Peter Andreae, Xiaoying Gao