A latent topic model for linked documents

16 years 1 months ago

Download www.nec-labs.com

Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, in this paper we propose a citation-topic (CT) model which assumes a probabilistic generative process for corpora. In the CT model a given document is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is related to the given document. Moreover, the CT model contains a random process for selecting the related documents according to the structure of the generative model determined by links and therefore, the transitivity of the relations among documents is captured. We apply the CT model on the document clustering task and the experimental comparisons against several state-of-the-art approaches demonstrate very promising performances. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Retrieva...

Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei Zhang, Y

Real-time Traffic

CT Model | Document | Document Clustering | Information Retrieval | SIGIR 2009 |

claim paper

Related Content

» Latent Topic Models for Hypertext

» A Topic Model for Linked Documents and Update Rules for its Estimation

» Crosslanguage linking of news stories on the web using interlingual topic modelling

» Joint latent topic models for text and citations

» Topic models with powerlaw using PitmanYor process

» HTM A Topic Model for Hypertexts

» Bayesian FoldingIn with Dirichlet Kernels for PLSI

» Probabilistic latent semantic visualization topic model for visualizing documents

» Exploring the use of latent topical information for statistical Chinese spoken document re...

Post Info
More Details (n/a)

Added	28 May 2010
Updated	28 May 2010
Type	Conference
Year	2009
Where	SIGIR
Authors	Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei Zhang, Yihong Gong

Comments (0)

Sciweavers

A latent topic model for linked documents

CT Model | Document | Document Clustering | Information Retrieval | SIGIR 2009 |

Explore & Download

Productivity Tools

Sciweavers