Sciweavers

ACL
1996

Using Parsed Corpora for Structural Disambiguation in the TRAINS Domain

14 years 1 months ago
Using Parsed Corpora for Structural Disambiguation in the TRAINS Domain
This paper describes a prototype disambiguation module, KANKEI, which was tested on two corpora of the TRAINS project. In ambiguous verb phrases of form V ... NP PP or V ... NP adverb(s), the two corpora have very different PP and adverb attachment patterns; in the first, the correct attachment is to the VP 88.7% of the time, while in the second, the correct attachment is to the NP 73.5% of the time. KANKEI uses various n-gram patterns of the phrase heads around these ambiguities, and assigns parse trees (with these ambiguities) a score based on a linear combination of the frequencies with which these patterns appear with NP and VP attachments in the TRAINS corpora. Unlike previous statistical disambiguation systems, this technique thus combines evidence from bigrams, trigrams, and the 4-gram around an ambiguous attachment. In the current experiments, equal weights are used for simplicity but results are still good on the TRAINS corpora (92.2% and 92.4% accuracy). Despite the large st...
Mark G. Core
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1996
Where ACL
Authors Mark G. Core
Comments (0)