Unsupervised Multilingual Grammar Induction

13 years 10 months ago

Download www.aclweb.org

We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allowing language-specific syntactic structure. We perform inference under this model using Markov Chain Monte Carlo and dynamic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-English) we find substantial performance gains over the CCM model, a strong monolingual baseline. On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure. 1

Benjamin Snyder, Tahira Naseem, Regina Barzilay

Real-time Traffic

ACL 2009 | Bilingual Parallel Corpora | Computational Linguistics | Model | Monolingual |

claim paper

Post Info
More Details (n/a)

Added	16 Feb 2011
Updated	16 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Benjamin Snyder, Tahira Naseem, Regina Barzilay

Comments (0)

Sciweavers

Unsupervised Multilingual Grammar Induction

ACL 2009 | Bilingual Parallel Corpora | Computational Linguistics | Model | Monolingual |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers