Sciweavers

EACL
2009
ACL Anthology

Analysing Wikipedia and Gold-Standard Corpora for NER Training

15 years 1 months ago
Analysing Wikipedia and Gold-Standard Corpora for NER Training
Named entity recognition (NER) for English typically involves one of three gold standards: MUC, CoNLL, or BBN, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive crosscorpus evaluation of NER. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on crosscorpus evaluation by up to 11%.
Joel Nothman, Tara Murphy, James R. Curran
Added 24 Nov 2009
Updated 24 Nov 2009
Type Conference
Year 2009
Where EACL
Authors Joel Nothman, Tara Murphy, James R. Curran
Comments (0)