Sciweavers

2764 search results - page 549 / 553
» Information Retrieval by Semantic Similarity
Sort
View
WWW
2007
ACM
14 years 8 months ago
Efficient search in large textual collections with redundancy
Current web search engines focus on searching only the most recent snapshot of the web. In some cases, however, it would be desirable to search over collections that include many ...
Jiangong Zhang, Torsten Suel
SIGMOD
2008
ACM
142views Database» more  SIGMOD 2008»
14 years 7 months ago
Cost-based variable-length-gram selection for string collections to support approximate queries efficiently
Approximate queries on a collection of strings are important in many applications such as record linkage, spell checking, and Web search, where inconsistencies and errors exist in...
Xiaochun Yang, Bin Wang, Chen Li
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
JCDL
2006
ACM
176views Education» more  JCDL 2006»
14 years 1 months ago
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo an...
Shaolei Feng, R. Manmatha
DL
1998
Springer
159views Digital Library» more  DL 1998»
13 years 12 months ago
CiteSeer: An Automatic Citation Indexing System
We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to ...
C. Lee Giles, Kurt D. Bollacker, Steve Lawrence