Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

177

TREC
2004

127views Information Technology» more TREC 2004»

Language Models for Searching in Web Corpora

15 years 8 months ago

Language Models for Searching in Web Corpora

Download trec.nist.gov

: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and documents titles, with a range of webcentric priors. We provide a detailed analysis of the effect on relevance of document length, URL structure, and link topology. The resulting web-centric priors are applied to three types of topics--distillation, home page, and named page--and improve effectiveness for all topic types, as well as for the mixed query set. For the terabyte track, we experimented with building an index just based on the document titles, or on the incoming anchor texts. Very selective indexing leads to a compact index that is effective in terms of early precision, catering for the typical web searcher behavior.

Jaap Kamps, Gilad Mishne, Maarten de Rijke

Real-time Traffic

Document | Terabyte Track | TREC 2004 | Trec 2004 Web | TREC 2008 |

claim paper

Related Content

» Improving the estimation of relevance models using large external corpora

» Large LinguisticallyProcessed Web Corpora for Multiple Languages

» Translating unknown queries with web corpora for crosslanguage information retrieval

» Referential semantic language modeling for datapoor domains

» Hierarchical Language Models for Expert Finding in Enterprise Corpora

» Web Text Corpus for Natural Language Processing

» Language and Translation Model Adaptation using Comparable Corpora

» Query term disambiguation for Web crosslanguage information retrieval using a search engin...

» NAGA Searching and Ranking Knowledge

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	TREC
Authors	Jaap Kamps, Gilad Mishne, Maarten de Rijke

Comments (0)