Sciweavers

ACL
2015

Gaussian LDA for Topic Models with Word Embeddings

8 years 8 months ago
Gaussian LDA for Topic Models with Word Embeddings
Continuous space word embeddings learned from large, unstructured corpora have been shown to be effective at capturing semantic regularities in language. In this paper we replace LDA’s parameterization of “topics” as categorical distributions over opaque word types with multivariate Gaussian distributions on the embedding space. This encourages the model to group words that are a priori known to be semantically related into topics. To perform inference, we introduce a fast collapsed Gibbs sampling algorithm based on Cholesky decompositions of covariance matrices of the posterior predictive distributions. We further derive a scalable algorithm that draws samples from stale posterior predictive distributions and corrects them with a Metropolis–Hastings step. Using vectors learned from a domain-general corpus (English Wikipedia), we report results on two document collections (20-newsgroups and NIPS). Qualitatively, Gaussian LDA infers different (but still very sensible) topics re...
Rajarshi Das, Manzil Zaheer, Chris Dyer
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ACL
Authors Rajarshi Das, Manzil Zaheer, Chris Dyer
Comments (0)