Factored Language Models and Generalized Parallel Backoff

14 years 2 months ago

Download ssli.ee.washington.edu

We introduce factored language models (FLMs) and generalized parallel backoff (GPB). An FLM represents words as bundles of features (e.g., morphological classes, stems, data-driven clusters, etc.), and induces a probability model covering sequences of bundles rather than just words. GPB extends standard backoff to general conditional probability tables where variables might be heterogeneous types, where no obvious natural (temporal) backoff order exists, and where multiple dynamic backoff strategies are allowed. These methodologies were implemented during the JHU 2002 workshop as extensions to the SRI language modeling toolkit. This paper provides initial perplexity results on both CallHome Arabic and on Penn Treebank Wall Street Journal articles. Signiﬁcantly, FLMs with GPB can produce bigrams with significantly lower perplexity, sometimes lower than highly-optimized baseline trigrams. In a multi-pass speech recognition context, where bigrams are used to create ﬁrst-pass bigram l...

Jeff Bilmes, Katrin Kirchhoff

Real-time Traffic

Conditional Probability Tables | FLM Represents Words | Lower Perplexity | NAACL 2003 | NAACL 2007 |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	NAACL
Authors	Jeff Bilmes, Katrin Kirchhoff

Comments (0)

Sciweavers

Factored Language Models and Generalized Parallel Backoff

Conditional Probability Tables | FLM Represents Words | Lower Perplexity | NAACL 2003 | NAACL 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers