Sciweavers

218 search results - page 4 / 44
» Crawling for Images on the WWW
Sort
View
WWW
2005
ACM
14 years 8 months ago
Focused crawling by exploiting anchor text using decision tree
Focused crawlers are considered as a promising way to tackle the scalability problem of topic-oriented or personalized search engines. To design a focused crawler, the choice of s...
Jun Li, Kazutaka Furuse, Kazunori Yamaguchi
WWW
2007
ACM
14 years 8 months ago
Parallel crawling for online social networks
Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers tha...
Duen Horng Chau, Shashank Pandit, Samuel Wang, Chr...
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2009
ACM
14 years 8 months ago
Sitemaps: above and beyond the crawl of duty
Comprehensive coverage of the public web is crucial to web search engines. Search engines use crawlers to retrieve pages and then discover new ones by extracting the pages' o...
Uri Schonfeld, Narayanan Shivakumar