Unsupervised Learning from Linked Documents

14 years 25 days ago

Download www.nec-labs.com

Documents in many corpora, such as digital libraries and webpages, contain both content and link information. In a traditional topic model which plays an important role in the unsupervised learning, the link information is either totally ignored or treated as a feature similar to content. We believe that neither approach is capable of accurately capturing the relations represented by links. To address the limitation of traditional topic models, in this paper we propose a citation-topic (CT) model that explicitly considers the document relations represented by links. In the CT model, instead of being treated as yet another feature, links are used to form the structure of the generative model. As a result, in the CT model a given document is modeled as a mixture of a set of topic distributions, each of which is borrowed (cited) from a document that is related to the given document. We apply the CT model to several document collections and the experimental comparisons against state-of-th...

Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei Zhang, Y

Real-time Traffic

Computer Vision | Document | ICPR 2010 | Link Information | Traditional Topic Models |

claim paper

Post Info
More Details (n/a)

Added	26 Jan 2011
Updated	26 Jan 2011
Type	Journal
Year	2010
Where	ICPR
Authors	Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei Zhang, Yihong Gong

Comments (0)

Sciweavers

Unsupervised Learning from Linked Documents

Computer Vision | Document | ICPR 2010 | Link Information | Traditional Topic Models |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers