Sciweavers

IJCNLP
2005
Springer

Parsing Biomedical Literature

14 years 4 months ago
Parsing Biomedical Literature
We present a preliminary study of several parser adaptation es evaluated on the GENIA corpus of MEDLINE abstracts [1,2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and namedentities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2% reduction in error. With oracle
Matthew Lease, Eugene Charniak
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where IJCNLP
Authors Matthew Lease, Eugene Charniak
Comments (0)