Sciweavers

ICML
1998
IEEE

Employing EM and Pool-Based Active Learning for Text Classification

15 years 19 days ago
Employing EM and Pool-Based Active Learning for Text Classification
This paper shows how a text classifier's need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of active learning to use the unlabeled pool for explicitly estimating document density when selecting examples for labeling. Then active learning is combined with ExpectationMaximization in order to "fill in" the class labels of those documents that remain unlabeled. Experimental results show that the improvements to active learning require less than two-thirds as many labeled training examples as previous QBC approaches, and that the combination of EM and active learning requires only slightly more than half as many labeled training examples to achieve the same accuracy as either the improved active learning or EM alone.
Andrew McCallum, Kamal Nigam
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 1998
Where ICML
Authors Andrew McCallum, Kamal Nigam
Comments (0)