Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

13 years 9 months ago

Download www.cis.upenn.edu

Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependencies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and prepositions with an accuracy of 87.4%.

Emily Pitler

Real-time Traffic

ACL 2012 | Computational Linguistics | Conjunctions | Correct Resolution | Data Features |

claim paper

Post Info
More Details (n/a)

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	ACL
Authors	Emily Pitler

Comments (0)

Sciweavers

Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

ACL 2012 | Computational Linguistics | Conjunctions | Correct Resolution | Data Features |

Explore & Download

Productivity Tools

Sciweavers