Sciweavers

CIKM
2009
Springer

Improving web page classification by label-propagation over click graphs

14 years 6 months ago
Improving web page classification by label-propagation over click graphs
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled similar documents. Current state-of-the-art classifiers are supervised and require large amounts of manually labeled data. We hypothesize that unlabeled documents similar to our positive and negative labeled documents tend to be clicked through by the same user queries. Our proposed method leverages this hypothesis and augments our training set by modeling the similarity between documents in a click graph. We experiment with three different web page classifiers and show empirical evidence that our proposed approach outperforms stateof-the-art methods and reduces the amount of human effort to label training data. Categories and Subject Descriptors H.3.3 [INFORMATION STORAGE AND RETRIEVAL]: Information Search and Retrieval – Relevance feedback, Selection process General Terms Algorithms, Measurement, Performanc...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where CIKM
Authors Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffney
Comments (0)