Sciweavers

299 search results - page 49 / 60
» User-centric Web crawling
Sort
View
CIDR
2009
129views Algorithms» more  CIDR 2009»
13 years 9 months ago
Extracting and Querying a Comprehensive Web Database
Recent research in domain-independent information extraction holds the promise of an automatically-constructed structured database derived from the Web. A query system based on th...
Michael J. Cafarella
SIGIR
2008
ACM
13 years 8 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
ICIP
2000
IEEE
14 years 10 months ago
Efficient Video Similarity Measurement and Search
We consider the use of meta-data and/or video-domain methods to detect similar videos on the web. Meta-data is extracted from the textual and hyperlink information associated with...
Sen-Ching S. Cheung, Avideh Zakhor
WWW
2004
ACM
14 years 9 months ago
Combining link and content analysis to estimate semantic similarity
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...
Filippo Menczer
CIKM
2009
Springer
14 years 3 months ago
Graph-based seed selection for web-scale crawlers
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
Shuyi Zheng, Pavel Dmitriev, C. Lee Giles