Accelerated focused crawling through online relevance feedback

15 years 9 days ago

Download www.cse.iitb.ac.in

The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded. We show that there is indeed a great deal of usable information on a HREF source page about the relevance of the target page. This information, encoded suitably, can be exploited by a supervised apprentice which takes online lessons from a traditional focused crawler by observing a carefully designed set of f...

Soumen Chakrabarti, Kunal Punera, Mallela Subraman

Real-time Traffic

HREF Source Page | HREF Target Page | Internet Technology | Traditional Focused Crawler | WWW 2002 |

claim paper

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2002
Where	WWW
Authors	Soumen Chakrabarti, Kunal Punera, Mallela Subramanyam

Comments (0)

Sciweavers

Accelerated focused crawling through online relevance feedback

HREF Source Page | HREF Target Page | Internet Technology | Traditional Focused Crawler | WWW 2002 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers