Improving web page classification by label-propagation over click graphs

14 years 6 months ago

Download www.patrickpantel.com

In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled similar documents. Current state-of-the-art classifiers are supervised and require large amounts of manually labeled data. We hypothesize that unlabeled documents similar to our positive and negative labeled documents tend to be clicked through by the same user queries. Our proposed method leverages this hypothesis and augments our training set by modeling the similarity between documents in a click graph. We experiment with three different web page classifiers and show empirical evidence that our proposed approach outperforms stateof-the-art methods and reduces the amount of human effort to label training data. Categories and Subject Descriptors H.3.3 [INFORMATION STORAGE AND RETRIEVAL]: Information Search and Retrieval – Relevance feedback, Selection process General Terms Algorithms, Measurement, Performanc...

Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn

Real-time Traffic

CIKM 2009 | Database | Training Data | Unlabeled Similar Documents | Web Page Classification |

claim paper

Post Info
More Details (n/a)

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	CIKM
Authors	Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffney

Comments (0)

Sciweavers

Improving web page classification by label-propagation over click graphs

CIKM 2009 | Database | Training Data | Unlabeled Similar Documents | Web Page Classification |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers