Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

14 years 6 months ago

Download www.cs.cmu.edu

Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive framework to model, visualize and summarize large document collections in a completely unsupervised fashion. One of the limitations of this family of models is their assumption of exchangeability of words within documents, which results in a ‘bag-ofwords’ representation for documents as well as topics. As a consequence, precious information that exists in the form of correlations between words is lost in these models. In this work, we adapt recent advances in sparse modeling techniques to the problem of modeling word correlations within topics and present a new algorithm called Sparse Word Graphs. Our experiments on AP corpus reveal both long-distance and short-distance word correlations within topics that are semantically very meaningful. In addition, the new algorithm is highly scalable to large collections as it captures only the most important correlations in a sparse manner.

Ramesh Nallapati, Amr Ahmed, William W. Cohen, Eri

Real-time Traffic

Correlations | Data Mining | ICDM 2007 | Short-distance Word Correlations | Word Correlations |

claim paper

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	ICDM
Authors	Ramesh Nallapati, Amr Ahmed, William W. Cohen, Eric P. Xing

Comments (0)

Sciweavers

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

Correlations | Data Mining | ICDM 2007 | Short-distance Word Correlations | Word Correlations |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers