(Semi-)Automatic Detection of Errors in PoS-Tagged Corpora

15 years 6 months ago

Download acl.ldc.upenn.edu

This paper presents a simple yet in practice very efficient technique serving for automatic detection of those positions in a partof-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and later application of "negative bigrams", i.e. on the search for pairs of adjacent tags which constitute an incorrect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - FINITE VERB). Further, the paper describes the generalization of the "negative bigrams" into "negative n-grams", for any natural n, which indeed provides a powerful tool for error detection in a corpus. The implementation is also discussed, as well as evaluation of results of the approach when used for error detection in the NEGRA

Pavel Kveton, Karel Oliva

Real-time Traffic

COLING 2002 | COLING 2008 | Efficient Technique | Error Detection | Negative Bigrams |

claim paper

Post Info
More Details (n/a)

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2002
Where	COLING
Authors	Pavel Kveton, Karel Oliva

Comments (0)

Sciweavers

(Semi-)Automatic Detection of Errors in PoS-Tagged Corpora

COLING 2002 | COLING 2008 | Efficient Technique | Error Detection | Negative Bigrams |

Explore & Download

Productivity Tools

Sciweavers