Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields

16 years 2 months ago

Download www.cheng1.net

We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified EM method, the generalized inside-outside algorithm, which extends the inside-outside algorithm to incorporate the effects of the n-gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of ngram counts in cases where there are hidden variables. We also derive an analogous algorithm to calculate the probability of initial subsequence of a sentence, generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in...

Shaojun Wang, Shaomin Wang, Russell Greiner, Dale

Real-time Traffic

Baseline Trigram Model | Composite Language Model | Directed Mrf Model | ICML 2005 | Machine Learning |

claim paper

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2005
Where	ICML
Authors	Shaojun Wang, Shaomin Wang, Russell Greiner, Dale Schuurmans, Li Cheng

Sciweavers

Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields

Baseline Trigram Model | Composite Language Model | Directed Mrf Model | ICML 2005 | Machine Learning |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers