Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into acc...
We describe a simple improvement to ngram language models where we estimate the distribution over closed-class (function) words separately from the conditional distribution of ope...
We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) fo...
Shaojun Wang, Shaomin Wang, Russell Greiner, Dale ...
We present an approximation to the Bayesian hierarchical PitmanYor process language model which maintains the power law distribution over word tokens, while not requiring a comput...
This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in...