Probabilistic author-topic models for information discovery

16 years 7 months ago

Download psiexp.ss.uci.edu

We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words for that topic. The words in a multi-author paper are assumed to be the result of a mixture of each authors' topic mixture. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to a large corpus of abstracts and 85,000 authors from the well-known CiteSeer digital library, and learn a model with 300 topics. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, significant trends in the computer science literature between 2002, parsing ...

Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T

Real-time Traffic

Data Mining | KDD 2004 | Specific Topic | Topic Mixture | Unsupervised Learning Technique |

claim paper

» P3coupon A probabilistic system for Prompt and Privacypreserving electronic coupon distrib...

» A Categorical Model for Discovering Latent Structure in Social Annotations

» Topical NGrams Phrase and Topic Discovery with an Application to Information Retrieval

» Knowledge discovery of semantic relationships between words using nonparametric bayesian g...

» Knowledge discovery of multipletopic document using parametric mixture model with dirichle...

» Probabilistic models for discovering ecommunities

» Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene ...

» A Probabilistic Model for FineGrained Expert Search

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2004
Where	KDD
Authors	Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, Thomas L. Griffiths

Comments (0)

Sciweavers

Probabilistic author-topic models for information discovery

Data Mining | KDD 2004 | Specific Topic | Topic Mixture | Unsupervised Learning Technique |

Explore & Download

Productivity Tools

Sciweavers