Sciweavers

KDD
2009
ACM

Efficient methods for topic model inference on streaming document collections

15 years 1 months ago
Efficient methods for topic model inference on streaming document collections
Topic models provide a powerful tool for analyzing large text collections by representing high dimensional data in a low dimensional subspace. Fitting a topic model given a set of training documents requires approximate inference techniques that are computationally expensive. With today's large-scale, constantly expanding document collections, it is useful to be able to infer topic distributions for new documents without retraining the model. In this paper, we empirically evaluate the performance of several methods for topic inference in previously unseen documents, including methods based on Gibbs sampling, variational inference, and a new method inspired by text classification. The classificationbased inference method produces results similar to iterative inference methods, but requires only a single matrix multiplication. In addition to these inference methods, we present SparseLDA, an algorithm and data structure for evaluating Gibbs sampling distributions. Empirical results ...
Limin Yao, David M. Mimno, Andrew McCallum
Added 25 Nov 2009
Updated 25 Nov 2009
Type Conference
Year 2009
Where KDD
Authors Limin Yao, David M. Mimno, Andrew McCallum
Comments (0)