Sciweavers

CORR
2000
Springer

Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

13 years 11 months ago
Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that COMBI-BOOTSTRAP: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.
Jakub Zavrel, Walter Daelemans
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2000
Where CORR
Authors Jakub Zavrel, Walter Daelemans
Comments (0)