Employing EM and Pool-Based Active Learning for Text Classification

16 years 8 months ago

Download www.kamalnigam.com

This paper shows how a text classifier's need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of active learning to use the unlabeled pool for explicitly estimating document density when selecting examples for labeling. Then active learning is combined with ExpectationMaximization in order to "fill in" the class labels of those documents that remain unlabeled. Experimental results show that the improvements to active learning require less than two-thirds as many labeled training examples as previous QBC approaches, and that the combination of EM and active learning requires only slightly more than half as many labeled training examples to achieve the same accuracy as either the improved active learning or EM alone.

Andrew McCallum, Kamal Nigam

Real-time Traffic

Active Learning | ICML 1998 | Machine Learning | Unlabeled Documents | Unlabeled Pool |

claim paper

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	1998
Where	ICML
Authors	Andrew McCallum, Kamal Nigam

Sciweavers

Employing EM and Pool-Based Active Learning for Text Classification

Active Learning | ICML 1998 | Machine Learning | Unlabeled Documents | Unlabeled Pool |

Explore & Download

Productivity Tools

Sciweavers