XML-Based Data Preparation for Robust Deep Parsing

14 years 4 months ago

Download www.ltg.ed.ac.uk

We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the `messiness' in real language data and improve parse performance.

Claire Grover, Alex Lascarides

Real-time Traffic

ACL 2001 | ACL 2007 | ANLT Wide-coverage Grammar | Mark-up Tools | XML Tokenisation |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	ACL
Authors	Claire Grover, Alex Lascarides

Comments (0)

Sciweavers

XML-Based Data Preparation for Robust Deep Parsing

ACL 2001 | ACL 2007 | ANLT Wide-coverage Grammar | Mark-up Tools | XML Tokenisation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers