Sciweavers

125 search results - page 6 / 25
» Smoothing a Tera-word Language Model
Sort
View
SIGIR
2004
ACM
14 years 3 months ago
Corpus structure, language models, and ad hoc information retrieval
Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into acc...
Oren Kurland, Lillian Lee
NLPRS
2001
Springer
14 years 2 months ago
A Simple Closed-Class/Open-Class Factorization for Improved Language Modeling
We describe a simple improvement to ngram language models where we estimate the distribution over closed-class (function) words separately from the conditional distribution of ope...
Fuchun Peng, Dale Schuurmans
ICML
2005
IEEE
14 years 10 months ago
Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields
We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) fo...
Shaojun Wang, Shaomin Wang, Russell Greiner, Dale ...
ICASSP
2010
IEEE
13 years 10 months ago
Power law discounting for n-gram language models
We present an approximation to the Bayesian hierarchical PitmanYor process language model which maintains the power law distribution over word tokens, while not requiring a comput...
Songfang Huang, Steve Renals
ECIR
2008
Springer
13 years 11 months ago
Probabilistic Document Length Priors for Language Models
This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in...
Roi Blanco, Alvaro Barreiro