Reduced n-gram Models for English and Chinese Corpora

15 years 3 months ago

Download acl.ldc.upenn.edu

Statistical language models should improve as the size of the n-grams increases from 3 to 5 or higher. However, the number of parameters and calculations, and the storage requirement increase very rapidly if we attempt to store all possible combinations of n-grams. To avoid these problems, the reduced n-grams' approach previously developed by O'Boyle (1993) can be applied. A reduced n-gram language model can store an entire corpus's phrase-history length within feasible storage limits. Another theoretical advantage of reduced n-grams is that they are closer to being semantically complete than traditional models, which include all n-grams. In our experiments, the reduced n-gram Zipf curves are first presented, and compared with previously obtained conventional n-grams for both English and Chinese. The reduced n-gram model is then applied to large English and Chinese corpora. For English, we can reduce the model sizes, compared to 7-gram traditional model sizes, with fact...

Le Quan Ha, Philip Hanna, Darryl Stewart, F. Jack

Real-time Traffic

ACL 2006 | ACL 2007 | Language Model | N-grams Increases | Reduced N-gram Language |

claim paper

» Automatic construction of parallel EnglishChinese corpus for crosslanguage information ret...

» Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Tr...

» Unsupervised Multilingual Grammar Induction

» Collocation Translation Acquisition Using Monolingual Corpora

» BilinguallyConstrained Monolingual ShiftReduce Parsing

» Court StenographytoText stt in Hong Kong a Jurilinguistic Engineering Effort

» Hindi Urdu Machine Transliteration using FiniteState Transducers

» A Chunking Strategy Towards Unknown Word Detection in Chinese Word Segmentation

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	ACL
Authors	Le Quan Ha, Philip Hanna, Darryl Stewart, F. Jack Smith

Comments (0)

Sciweavers

Reduced n-gram Models for English and Chinese Corpora

ACL 2006 | ACL 2007 | Language Model | N-grams Increases | Reduced N-gram Language |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers