Reranking and Self-Training for Parser Adaptation

14 years 5 months ago

Download cs.brown.edu

Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus at the expense of portability to other genres. Such worries have merit. The standard "Charniak parser" checks in at a labeled precisionrecall f-measure of 89.7% on the Penn WSJ test set, but only 82.9% on the test set from the Brown treebank corpus. This paper should allay these fears. In particular, we show that the reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2%. Furthermore, use of the self-training techniques described in (McClosky et al., 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data. This is remarkable s...

David McClosky, Eugene Charniak, Mark Johnson

Real-time Traffic

ACL 2006 | ACL 2007 | Parser | Treebank | WSJ Treebank Data |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	ACL
Authors	David McClosky, Eugene Charniak, Mark Johnson

Comments (0)

Sciweavers

Reranking and Self-Training for Parser Adaptation

ACL 2006 | ACL 2007 | Parser | Treebank | WSJ Treebank Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers