Sciweavers

611 search results - page 18 / 123
» Random web crawls
Sort
View
WIDM
2006
ACM
14 years 1 months ago
Lazy preservation: reconstructing websites by crawling the crawlers
Backup of websites is often not considered until after a catastrophic event has occurred to either the website or its webmaster. We introduce “lazy preservation” – digital p...
Frank McCown, Joan A. Smith, Michael L. Nelson
CORR
2012
Springer
292views Education» more  CORR 2012»
12 years 3 months ago
Optimal Threshold Control by the Robots of Web Search Engines with Obsolescence of Documents
A typical web search engine consists of three principal parts: crawling engine, indexing engine, and searching engine. The present work aims to optimize the performance of the cra...
Konstantin Avrachenkov, Alexander N. Dudin, Valent...
WWW
2007
ACM
14 years 8 months ago
Parallel crawling for online social networks
Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers tha...
Duen Horng Chau, Shashank Pandit, Samuel Wang, Chr...
SIGIR
2002
ACM
13 years 7 months ago
Do TREC web collections look like the web?
We measure the WT10g test collection, used in the TREC-9 and TREC 2001 Web Tracks, and the .GOV test collection used in the TREC 2002 Web and Interactive Tracks, with common measu...
Ian Soboroff
ERCIMDL
2003
Springer
106views Education» more  ERCIMDL 2003»
14 years 28 days ago
Topical Crawling for Business Intelligence
Abstract. The Web provides us with a vast resource for business intelligence. However, the large size of the Web and its dynamic nature make the task of foraging appropriate inform...
Gautam Pant, Filippo Menczer