Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web

14 years 5 months ago

Download www.aameeksingh.com

This paper describes a decentralized peer-to-peer model for building a Web crawler. Most of the current systems use a centralized client-server model, in which the crawl is done by one or more tightly coupled machines, but the distribution of the crawling jobs and the collection of crawled results are managed in a centralized system using a centralized URL repository. Centralized solutions are known to have problems like link congestion, being a single point of failure, and expensive administration. It requires both horizontal and vertical scalability solutions to manage Network File Systems (NFS) and load balancing DNS and HTTP requests. In this paper, we present an architecture of a completely distributed and decentralized Peer-to-Peer (P2P) crawler called Apoidea, which is self-managing and uses geographical proximity of the web resources to the peers for a better and faster crawl. We use Distributed Hash Table (DHT) based protocols to perform the critical URL-duplicate and content-...

Aameek Singh, Mudhakar Srivatsa, Ling Liu, Todd Mi

Real-time Traffic

Centralized | Centralized Client-server Model | Decentralized Peer-to-peer Model | SIGIR 2003 |

claim paper

Post Info
More Details (n/a)

Added	05 Jul 2010
Updated	05 Jul 2010
Type	Conference
Year	2003
Where	SIGIR
Authors	Aameek Singh, Mudhakar Srivatsa, Ling Liu, Todd Miller

Comments (0)

Sciweavers

Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web

Centralized | Centralized Client-server Model | Decentralized Peer-to-peer Model | SIGIR 2003 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers