Sciweavers

472 search results - page 7 / 95
» Crawling the Hidden Web
Sort
View
ADMA
2009
Springer
142views Data Mining» more  ADMA 2009»
14 years 2 months ago
Crawling Deep Web Using a New Set Covering Algorithm
Abstract. Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This ca...
Yan Wang, Jianguo Lu, Jessica Chen
WWW
2004
ACM
14 years 8 months ago
Distributed community crawling
The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the...
Fabrizio Costa, Paolo Frasconi
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
STOC
2002
ACM
95views Algorithms» more  STOC 2002»
14 years 7 months ago
Crawling on web graphs
Colin Cooper, Alan M. Frieze
WWW
2005
ACM
14 years 8 months ago
User-centric Web crawling
Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages...
Sandeep Pandey, Christopher Olston