Sciweavers

295 search results - page 46 / 59
» Web Crawling
Sort
View
PVLDB
2010
161views more  PVLDB 2010»
13 years 8 months ago
Annotating and Searching Web Tables Using Entities, Types and Relationships
Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational...
Girija Limaye, Sunita Sarawagi, Soumen Chakrabarti
CIKM
2009
Springer
14 years 4 months ago
Vetting the links of the web
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...
Na Dai, Brian D. Davison
CIKM
2009
Springer
14 years 4 months ago
Graph-based seed selection for web-scale crawlers
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
Shuyi Zheng, Pavel Dmitriev, C. Lee Giles
SIGIR
2008
ACM
13 years 9 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
ISW
2009
Springer
14 years 4 months ago
Automated Spyware Collection and Analysis
Various online studies on the prevalence of spyware attest overwhelming numbers (up to 80%) of infected home computers. However, the term spyware is ambiguous and can refer to anyt...
Andreas Stamminger, Christopher Kruegel, Giovanni ...