Sciweavers

NIPS
2008

Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization

14 years 1 months ago
Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization
The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classification accuracy becomes a challenge. We introduce "Semi-supervised Learning with Weakly-Related Unlabeled Data" (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classification tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To th...
Liu Yang, Rong Jin, Rahul Sukthankar
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where NIPS
Authors Liu Yang, Rong Jin, Rahul Sukthankar
Comments (0)