Sciweavers

ICPR
2010
IEEE

Unsupervised Learning from Linked Documents

13 years 10 months ago
Unsupervised Learning from Linked Documents
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. In a traditional topic model which plays an important role in the unsupervised learning, the link information is either totally ignored or treated as a feature similar to content. We believe that neither approach is capable of accurately capturing the relations represented by links. To address the limitation of traditional topic models, in this paper we propose a citation-topic (CT) model that explicitly considers the document relations represented by links. In the CT model, instead of being treated as yet another feature, links are used to form the structure of the generative model. As a result, in the CT model a given document is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is related to the given document. We apply the CT model to several document collections and the experimental comparisons against state-of-th...
Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei Zhang, Y
Added 26 Jan 2011
Updated 26 Jan 2011
Type Journal
Year 2010
Where ICPR
Authors Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei Zhang, Yihong Gong
Comments (0)