Learning Bigrams from Unigrams

15 years 8 months ago

Download aclweb.org

Traditional wisdom holds that once documents are turned into bag-of-words (unigram count) vectors, word orders are completely lost. We introduce an approach that, perhaps surprisingly, is able to learn a bigram language model from a set of bag-of-words documents. At its heart, our approach is an EM algorithm that seeks a model which maximizes the regularized marginal likelihood of the bagof-words documents. In experiments on seven corpora, we observed that our learned bigram language models: i) achieve better test set perplexity than unigram models trained on the same bag-of-words documents, and are not far behind "oracle bigram models" trained on the corresponding ordered documents; ii) assign higher probabilities to sensible bigram word pairs; iii) improve the accuracy of ordereddocument recovery from a bag-of-words. Our approach opens the door to novel phenomena, for example, privacy leakage from index files.

Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, R

Real-time Traffic

ACL 2008 | Bag-of-words Documents | Bigram Language Models | Computational Linguistics | Regularized Marginal Likelihood |

claim paper

» SVM Approach to GeneRIF Annotation

» Contextual Dependencies in Unsupervised Word Segmentation

» Topic modeling beyond bagofwords

» Topical NGrams Phrase and Topic Discovery with an Application to Information Retrieval

» Measuring the Validity of Document Relations Discovered from Frequent Itemset Mining

» Sentiment Classification of Movie Reviews Using Contextual Valence Shifters

» Grammatical Bigrams

» Using eigenvectors of the bigram graph to infer morpheme identity

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	ACL
Authors	Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, Robert D. Nowak

Comments (0)

Sciweavers

Learning Bigrams from Unigrams

ACL 2008 | Bag-of-words Documents | Bigram Language Models | Computational Linguistics | Regularized Marginal Likelihood |

Explore & Download

Productivity Tools

Sciweavers