Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

73

WWW
2002
ACM

favoriteEmaildiscussreport

107views Internet Technology» more WWW 2002»

Parallel crawlers

16 years 2 months ago

Parallel crawlers

Download oak.cs.ucla.edu

In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We first propose multiple architectures for a parallel crawler and identify fundamental issues related to parallel crawling. Based on this understanding, we then propose metrics to evaluate a parallel crawler, and compare the proposed architectures using 40 million pages collected from the Web. Our results clarify the relative merits of each architecture and provide a good guideline on when to adopt which architecture. Keywords Web Crawler, Web Spider, Parallelization

Junghoo Cho, Hector Garcia-Molina

Real-time Traffic

Effective Parallel Crawler | Internet Technology | Keywords Web Crawler | Parallel Crawler | WWW 2002 |

claim paper

Related Content

» Parallel crawling for online social networks

» Scrawler A SeedBySeed Parallel Web Crawler

» A Focused Crawler with OntologySupported Website Models for Information Agents

» BulkSynchronous OnLine Crawling on Clusters of Computers

» AgentBased Approach for Web Crawling

» Collaborative Web Crawling Information GatheringProcessing over Internet

» Language Based Crawling Crawling the Arabic Content of the Web

» Distributed Pagerank for P2P Systems

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2002
Where	WWW
Authors	Junghoo Cho, Hector Garcia-Molina

Comments (0)