Sciweavers

WIDM
2006
ACM

Lazy preservation: reconstructing websites by crawling the crawlers

14 years 6 months ago
Lazy preservation: reconstructing websites by crawling the crawlers
Backup of websites is often not considered until after a catastrophic event has occurred to either the website or its webmaster. We introduce “lazy preservation” – digital preservation performed as a result of the normal operation of web crawlers and caches. Lazy preservation is especially suitable for third parties; for example, a teacher reconstructing a missing website used in previous classes. We evaluate the effectiveness of lazy preservation by reconstructing 24 websites of varying sizes and composition using Warrick, a web-repository crawler. Because of varying levels of completeness in any one repository, our reconstructions sampled from four different web repositories: Google (44%), MSN (30%), Internet Archive (19%) and Yahoo (7%). We also measured the time required for web resources to be discovered and cached (10-103 days) as well as how long they remained in cache after deletion (7-61 days). Categories and Subject Descriptors: H.3.5 [Information Storage and Retriev...
Frank McCown, Joan A. Smith, Michael L. Nelson
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where WIDM
Authors Frank McCown, Joan A. Smith, Michael L. Nelson
Comments (0)