Retrieving Web Pages Using Content, Links, URLs and Anchors

15 years 9 months ago

Download trec.nist.gov

For this year's web track, we concentrated on the entry page finding task. For the content-only runs, in both the ad-hoc task and the entry page finding task, we used an information retrieval system based on a simple unigram language model. In the Ad hoc task we experimented with alternatieve approaches to smoothing. For the entry page task, we incorporated additional information into the model. The sources of information we used in addition to the document's content are links, URLs and anchors. We found that almost every approach can improve the results of a content only run. In the end, a very basic approach, using the depth of the path of the URL as a prior, yielded by far the largest improvement over the content only results.

Thijs Westerveld, Wessel Kraaij, Djoerd Hiemstra

Real-time Traffic

Ad Hoc Task | Entry Page | Page Finding Task | TREC 2001 | TREC 2008 |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	TREC
Authors	Thijs Westerveld, Wessel Kraaij, Djoerd Hiemstra

Comments (0)

Sciweavers

Retrieving Web Pages Using Content, Links, URLs and Anchors

Ad Hoc Task | Entry Page | Page Finding Task | TREC 2001 | TREC 2008 |

Explore & Download

Productivity Tools

Sciweavers