Sciweavers

RIAO
2007

A Co-operative Web Services Paradigm for Supporting Crawlers

14 years 28 days ago
A Co-operative Web Services Paradigm for Supporting Crawlers
The traditional crawlers used by search engines to build their collection of Web pages frequently gather unmodified pages that already exist in their collection. This creates unnecessary Internet traffic and wastes search engine resources during page collection and indexing. Generally, the crawlers are also unable to collect dynamic pages, causing them to miss valuable information, and they cannot easily detect deleted pages, resulting in outdated search engine collections. To address these issues, we propose a new Web services paradigm for Website/crawler interaction that is co-operative and exploits the information present in the Web logs and file system. Our system supports a querying mechanism wherein the crawler can issue queries to the Web service on the Website and then collect pages based on the information provided in response to the query. We present experimental results demonstrating that, when compared to traditional crawlers, this approach provides bandwidth savings, more...
Aravind Chandramouli, Susan Gauch
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2007
Where RIAO
Authors Aravind Chandramouli, Susan Gauch
Comments (0)