Corpus structure, language models, and ad hoc information retrieval

16 years 1 months ago

Download www.cs.cornell.edu

Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-speciﬁc characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in precision and recall, and our new interpolation algorithm posts statistically signiﬁcant improvements for both metrics over all three corpora tested. Categories and Subject Descriptors H3.3 [Information Search and Retrieval]: Language models, clustering, smoothing General Terms Algorithms, Experiments Keywords language modeling, aspect models, interpolation model, clustering, smoothing, cluster-based language models

Oren Kurland, Lillian Lee

Real-time Traffic