Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

14 years 7 months ago

Download www.cs.umass.edu

Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. This paper presents topical n-grams, a topic model that discovers topics as well as topical phrases. The probabilistic model generates words in their textual order by, for each word, ﬁrst sampling a topic, then sampling its status as a unigram or bigram, and then sampling the word from a topic-speciﬁc unigram or bigram distribution. Thus our model can model “white house” as a special meaning phrase in the ‘politics’ topic, but not in the ‘real estate’ topic. Successive bigrams form longer phrases. We present experimental results showing meaningful phrases and more interpretable topics from the NIPS data and improved information retrieval performance on a TREC collection.

Xuerui Wang, Andrew McCallum, Xing Wei

Real-time Traffic

Data Mining | ICDM 2007 | Latent Dirichlet Allocation | Most Topic Models | Topic Model |

claim paper

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	ICDM
Authors	Xuerui Wang, Andrew McCallum, Xing Wei

Comments (0)

Sciweavers

Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

Data Mining | ICDM 2007 | Latent Dirichlet Allocation | Most Topic Models | Topic Model |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers