Positional language models for information retrieval

16 years 1 months ago

Download sifaka.cs.uiuc.edu

Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining “external” to the language modeling approach: (1) proximity heuristic which rewards a document where the matched query terms occur close to each other; (2) passage retrieval which scores a document mainly based on the best matching passage. Existing studies have only attempted to use a standard language model as a“black box” to implement these heuristics, making it hard to optimize the combination parameters. In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a uniﬁed language model. The key idea is to deﬁne a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an e...

Yuanhua Lv, ChengXiang Zhai

Real-time Traffic