Sciweavers

178 search results - page 20 / 36
» Scheduling Algorithms for Web Crawling
Sort
View
ICIP
2000
IEEE
14 years 9 months ago
Efficient Video Similarity Measurement and Search
We consider the use of meta-data and/or video-domain methods to detect similar videos on the web. Meta-data is extracted from the textual and hyperlink information associated with...
Sen-Ching S. Cheung, Avideh Zakhor
CIKM
2009
Springer
14 years 2 months ago
Vetting the links of the web
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...
Na Dai, Brian D. Davison
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
14 years 1 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
WAW
2007
Springer
144views Algorithms» more  WAW 2007»
14 years 1 months ago
Approximating Betweenness Centrality
Betweenness is a centrality measure based on shortest paths, widely used in complex network analysis. It is computationally-expensive to exactly determine betweenness; currently th...
David A. Bader, Shiva Kintali, Kamesh Madduri, Mil...
SIGIR
2008
ACM
13 years 7 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...