Sciweavers

125 search results - page 6 / 25
» Smoothing a Tera-word Language Model
Sort
View
SIGIR
2004
ACM
15 years 9 months ago
Corpus structure, language models, and ad hoc information retrieval
Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into acc...
Oren Kurland, Lillian Lee
NLPRS
2001
Springer
15 years 8 months ago
A Simple Closed-Class/Open-Class Factorization for Improved Language Modeling
We describe a simple improvement to ngram language models where we estimate the distribution over closed-class (function) words separately from the conditional distribution of ope...
Fuchun Peng, Dale Schuurmans
ICML
2005
IEEE
16 years 4 months ago
Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields
We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) fo...
Shaojun Wang, Shaomin Wang, Russell Greiner, Dale ...
ICASSP
2010
IEEE
15 years 4 months ago
Power law discounting for n-gram language models
We present an approximation to the Bayesian hierarchical PitmanYor process language model which maintains the power law distribution over word tokens, while not requiring a comput...
Songfang Huang, Steve Renals
ECIR
2008
Springer
15 years 5 months ago
Probabilistic Document Length Priors for Language Models
This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in...
Roi Blanco, Alvaro Barreiro