Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank

15 years 1 months ago

Download acl.ldc.upenn.edu

In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG categorybased subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach does not predefine frames, associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. We extract 3586 verb lemmas, 14348 semantic form types (an average of 4 per lemma) with 577 frame types. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource.

Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef

Real-time Traffic

ACL 2004 | ACL 2007 | Function-based Subcategorisation Frames | Mixed Function/category-based Frames | Subcategorisation Frames |

claim paper

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2004
Where	ACL
Authors	Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith, Andy Way

Sciweavers

Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank

ACL 2004 | ACL 2007 | Function-based Subcategorisation Frames | Mixed Function/category-based Frames | Subcategorisation Frames |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers