Sciweavers

AAAI
1998

Learning to Classify Text from Labeled and Unlabeled Documents

14 years 27 days ago
Learning to Classify Text from Labeled and Unlabeled Documents
In many important text classification problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper shows that the accuracy of text classifiers trained with a small number of labeled documents can be improved by augmenting this small training set with a large pool of unlabeled documents. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text based on the combination of Expectation-Maximization with a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents; it then trains a new classifier using the labels for all the documents, and iterates to convergence. Experimental results, obtained using text from three different realworld tasks, show that the use of un...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1998
Where AAAI
Authors Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom M. Mitchell
Comments (0)