Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

49

NLPRS
2001
Springer

favoriteEmaildiscussreport

125views Natural Language Processing» more NLPRS 2001»

A Simple Closed-Class/Open-Class Factorization for Improved Language Modeling

14 years 9 months ago

A Simple Closed-Class/Open-Class Factorization for Improved Language Modeling

Download www.afnlp.org

We describe a simple improvement to ngram language models where we estimate the distribution over closed-class (function) words separately from the conditional distribution of open-class words given function words. In English, function words account for about 30% of written language, and also form a natural skeleton for most sentences. By factoring a language model into a function word model and a conditional model over open-class words given function words, we largely avoid the problem of sparse training data in the ﬁrst phase, and localize the need for sophisticated smoothing techniques primarily to the second conditional model. We test our factored approach on the Brown and Wall Street Journal corpora and observe a 3.5% to 25.2% improvement in perplexity over standard methods, depending on the particular smoothing method and test set used. Compared to other proposals for improving n-gram language models, our factorization has the advantage of inherent simplicity and eﬃciency, a...

Fuchun Peng, Dale Schuurmans

Real-time Traffic

Function Words | Language Models | Natural Language Processing | NLPRS 2001 | Words Given Function |

claim paper

Related Content

» Neural network based language models for highly inflective languages

» Factor a dynamic stackbased programming language

» The Sentimental Factor Improving Review Classification Via HumanProvided Information

» An analysis on document length retrieval trends in language modeling smoothing

» An Improved Hierarchical Bayesian Model of Language for Document Classification

» Comprehension of Simple Quantifiers Empirical Evaluation of a Computational Model

» Evaluating Some Contextual Factors for Image Retrieval ReDCAD Participation at ImageCLEF ...

» Simple TypeLevel Unsupervised POS Tagging

» A Simple and Effective Hierarchical Phrase Reordering Model

Post Info
More Details (n/a)

Added	30 Jul 2010
Updated	30 Jul 2010
Type	Conference
Year	2001
Where	NLPRS
Authors	Fuchun Peng, Dale Schuurmans

Comments (0)