Designing efficient sampling techniques to detect webpage updates

16 years 7 months ago

Download www2007.org

Due to resource constraints, Web archiving systems and search engines usually have difficulties keeping the entire local repository synchronized with the Web. We advance the state-of-art of the samplingbased synchronization techniques by answering a challenging question: Given a sampled webpage and its change status, which other webpages are also likely to change? We present a study of various downloading granularities and policies, and propose an adaptive model based on the update history and the popularity of the webpages. We run extensive experiments on a large dataset of approximately 300,000 webpages to demonstrate that it is most likely to find more updated webpages in the current or upper directories of the changed samples. Moreover, the adaptive strategies outperform the non-adaptive one in terms of detecting important changes. Terms:Management, Design, Algorithms, Experimentation

Qingzhao Tan, Ziming Zhuang, Prasenjit Mitra, C. L

Real-time Traffic

Entire Local Repository | Internet Technology | Samplingbased Synchronization Techniques | Web Archiving Systems | WWW 2007 |

claim paper

» Graph Based Discriminative Learning for Robust and Efficient Object Tracking

» Efficient global optimization EGO for multiobjective problem and data mining

» Continual hashing for efficient finegrain state inconsistency detection

» Online maintenance of very large random samples on flash storage

» Learning Features for Tracking

» LOMA A fast method to generate efficient taggedrandom primers despite amplification bias o...

» Object Boundary Detection For OntologyBased Image Classification

» DVS for OnChip Bus Designs Based on Timing Error Correction

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2007
Where	WWW
Authors	Qingzhao Tan, Ziming Zhuang, Prasenjit Mitra, C. Lee Giles

Comments (0)

Sciweavers

Designing efficient sampling techniques to detect webpage updates

Entire Local Repository | Internet Technology | Samplingbased Synchronization Techniques | Web Archiving Systems | WWW 2007 |

Explore & Download

Productivity Tools

Sciweavers