Sciweavers

472 search results - page 17 / 95
» Crawling the Hidden Web
Sort
View
CORR
2012
Springer
292views Education» more  CORR 2012»
12 years 3 months ago
Optimal Threshold Control by the Robots of Web Search Engines with Obsolescence of Documents
A typical web search engine consists of three principal parts: crawling engine, indexing engine, and searching engine. The present work aims to optimize the performance of the cra...
Konstantin Avrachenkov, Alexander N. Dudin, Valent...
WWW
2007
ACM
14 years 8 months ago
Parallel crawling for online social networks
Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers tha...
Duen Horng Chau, Shashank Pandit, Samuel Wang, Chr...
SIGIR
2002
ACM
13 years 7 months ago
Do TREC web collections look like the web?
We measure the WT10g test collection, used in the TREC-9 and TREC 2001 Web Tracks, and the .GOV test collection used in the TREC 2002 Web and Interactive Tracks, with common measu...
Ian Soboroff
ERCIMDL
2003
Springer
106views Education» more  ERCIMDL 2003»
14 years 25 days ago
Topical Crawling for Business Intelligence
Abstract. The Web provides us with a vast resource for business intelligence. However, the large size of the Web and its dynamic nature make the task of foraging appropriate inform...
Gautam Pant, Filippo Menczer
WWW
2010
ACM
14 years 2 months ago
New-web search with microblog annotations
Web search engines discover indexable documents by recursively ‘crawling’ from a seed URL. Their rankings take into account link popularity. While this works well, it introduc...
Tom Rowlands, David Hawking, Ramesh Sankaranarayan...