Reducing class imbalance during active learning for named entity annotation

14 years 9 months ago

Download julielab.de

In lots of natural language processing tasks, the classes to be dealt with often occur heavily imbalanced in the underlying data set and classiﬁers trained on such skewed data tend to exhibit poor performance for low-frequency classes. We introduce and compare different approaches to reduce class imbalance by design within the context of active learning (AL). Our goal is to compile more balanced data sets up front during annotation time when AL is used as a strategy to acquire training material. We situate our approach in the context of named entity recognition. Our experiments reveal that we can indeed reduce class imbalance and increase the performance of classiﬁers on minority classes while preserving a good overall performance in terms of macro F-score. Categories and Subject Descriptors I.2.6 [Computing Methodologies]: Artiﬁcial Intelligence— Learning; I.2.7 [Computing Methodologies]: Artiﬁcial Intelligence—Natural Language Processing General Terms Algorithms, Design,...

Katrin Tomanek, Udo Hahn

Real-time Traffic

Class Imbalance | Data Sets | Information Retrieval | KCAP 2009 | Language Processing |

claim paper

Post Info
More Details (n/a)

Added	28 May 2010
Updated	28 May 2010
Type	Conference
Year	2009
Where	KCAP
Authors	Katrin Tomanek, Udo Hahn

Comments (0)

Sciweavers

Reducing class imbalance during active learning for named entity annotation

Class Imbalance | Data Sets | Information Retrieval | KCAP 2009 | Language Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers