Better Informed Training of Latent Syntactic Features

14 years 4 months ago

Download www.clsp.jhu.edu

We study unsupervised methods for learning refinements of the nonterminals in a treebank. Following Matsuzaki et al. (2005) and Prescher (2005), we may for example split NP without supervision into NP[0] and NP[1], which behave differently. We first propose to learn a PCFG that adds such features to nonterminals in such a way that they respect patterns of linguistic feature passing: each node's nonterminal features are either identical to, or independent of, those of its parent. This linguistic constraint reduces runtime and the number of parameters to be learned. However, it did not yield improvements when training on the Penn Treebank. An orthogonal strategy was more successful: to improve the performance of the EM learner by treebank preprocessing and by annealing methods that split nonterminals selectively. Using these methods, we can maintain high parsing accuracy while dramatically reducing the model size.

Markus Dreyer, Jason Eisner

Real-time Traffic

Constraint Reduces Runtime | EMNLP 2006 | EMNLP 2007 | Example Split Np | Unsupervised Methods |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	EMNLP
Authors	Markus Dreyer, Jason Eisner

Comments (0)

Sciweavers

Better Informed Training of Latent Syntactic Features

Constraint Reduces Runtime | EMNLP 2006 | EMNLP 2007 | Example Split Np | Unsupervised Methods |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers