A Comparison of Models for Cost-Sensitive Active Learning

13 years 9 months ago

Download acl.eldoc.ub.rug.nl

Active Learning (AL) is a selective sampling strategy which has been shown to be particularly cost-efficient by drastically reducing the amount of training data to be manually annotated. For the annotation of natural language data, cost efficiency is usually measured in terms of the number of tokens to be considered. This measure, assuming uniform costs for all tokens involved, is, from a linguistic perspective at least, intrinsically inadequate and should be replaced by a more adequate cost indicator, viz. the time it takes to manually label selected annotation examples. We here propose three different approaches to incorporate costs into the AL selection mechanism and evaluate them on the MUC7T corpus, an extension of the MUC7 newspaper corpus that contains such annotation time information. Our experiments reveal that using a costsensitive version of semi-supervised AL, up to 54% of true annotation time can be saved compared to random selection.

Katrin Tomanek, Udo Hahn

Real-time Traffic

Annotation | Annotation Time | COLING 2010 | Computational Linguistics | True Annotation Time |

claim paper

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Katrin Tomanek, Udo Hahn

Comments (0)

Sciweavers

A Comparison of Models for Cost-Sensitive Active Learning

Annotation | Annotation Time | COLING 2010 | Computational Linguistics | True Annotation Time |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers