This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in...
The inclusion of document length factors has been a major topic in the development of retrieval models. We believe that current models can be further improved by more refined est...
We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes...
Abstract. Document length is widely recognized as an important factor for adjusting retrieval systems. Many models tend to favor the retrieval of either short or long documents and...
Nonparametric Bayesian models are often based on the assumption that the objects being modeled are exchangeable. While appropriate in some applications (e.g., bag-ofwords models f...
Kurt T. Miller, Thomas L. Griffiths, Michael I. Jo...