Sciweavers

ICML
2002
IEEE

Combining Labeled and Unlabeled Data for MultiClass Text Categorization

15 years 1 months ago
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
Supervised learning techniques for text classi cation often require a large number of labeled examples to learn accurately. One way to reduce the amountoflabeled datarequired is to develop algorithms that can learn e ectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such as EM and Co-Training, are mostly applicable for classi cation tasks with a small number of classes and do not scale up well for large multiclass problems. In this paper, we develop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by rst decomposing multiclass problems into multiple binary problemsand then using Co-Trainingto learn the individual binary classi cation problems. We show that our method is especially useful for text classi cation tasks involving a large number of categories and outperforms other semi-supervised learning techniques such as EM...
Rayid Ghani
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2002
Where ICML
Authors Rayid Ghani
Comments (0)