The Evolution of the Web and Implications for an Incremental Crawler

15 years 10 months ago

Download rose.cs.ucla.edu

In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the "freshness" of the collection significantly and bring in new pages in a more timely manner. We first present results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Based on these experimental results, we compare various design choices for an incremental crawler and discuss their trade-offs. We propose an architecture for the incremental crawler, which combines the best design choices.

Junghoo Cho, Hector Garcia-Molina

Real-time Traffic

Database | Effective Incremental Crawler | Incremental Crawler | VLDB 2000 | Web Pages |

claim paper

Post Info
More Details (n/a)

Added	26 Aug 2010
Updated	26 Aug 2010
Type	Conference
Year	2000
Where	VLDB
Authors	Junghoo Cho, Hector Garcia-Molina

Comments (0)

Sciweavers

The Evolution of the Web and Implications for an Incremental Crawler

Database | Effective Incremental Crawler | Incremental Crawler | VLDB 2000 | Web Pages |

Explore & Download

Productivity Tools

Sciweavers