OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation

15 years 3 months ago

Download www.aclweb.org

Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective on identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. Thi...

Liang-Chih Yu, Chung-Hsien Wu, Eduard H. Hovy

Real-time Traffic

Annotated Corpora | COLING 2008 | Computational Linguistics | Most Large-scale Annotation | Word Sense |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	COLING
Authors	Liang-Chih Yu, Chung-Hsien Wu, Eduard H. Hovy

Comments (0)

Sciweavers

OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation

Annotated Corpora | COLING 2008 | Computational Linguistics | Most Large-scale Annotation | Word Sense |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers