A Co-operative Web Services Paradigm for Supporting Crawlers

14 years 2 months ago

Download citeseer.uark.edu

The traditional crawlers used by search engines to build their collection of Web pages frequently gather unmodified pages that already exist in their collection. This creates unnecessary Internet traffic and wastes search engine resources during page collection and indexing. Generally, the crawlers are also unable to collect dynamic pages, causing them to miss valuable information, and they cannot easily detect deleted pages, resulting in outdated search engine collections. To address these issues, we propose a new Web services paradigm for Website/crawler interaction that is co-operative and exploits the information present in the Web logs and file system. Our system supports a querying mechanism wherein the crawler can issue queries to the Web service on the Website and then collect pages based on the information provided in response to the query. We present experimental results demonstrating that, when compared to traditional crawlers, this approach provides bandwidth savings, more...

Aravind Chandramouli, Susan Gauch

Real-time Traffic

Information Technology | RIAO 2007 | Search Engine | Traditional Crawlers | Web Page |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2007
Where	RIAO
Authors	Aravind Chandramouli, Susan Gauch

Comments (0)

Sciweavers

A Co-operative Web Services Paradigm for Supporting Crawlers

Information Technology | RIAO 2007 | Search Engine | Traditional Crawlers | Web Page |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers