Sciweavers

EMNLP
2008

Latent-Variable Modeling of String Transductions with Finite-State Methods

14 years 9 days ago
Latent-Variable Modeling of String Transductions with Finite-State Methods
String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling correction, pronunciation modeling and inflectional morphology. We present a conditional loglinear model for string-to-string transduction, which employs overlapping features over latent alignment sequences, and which learns latent classes and latent string pair regions from incomplete training data. We evaluate our approach on morphological tasks and demonstrate that latent variables can dramatically improve results, even when trained on small data sets. On the task of generating morphological forms, we outperform a baseline method reducing the error rate by up to 48%. On a lemmatization task, we reduce the error rates in Wicentowski (2002) by 38
Markus Dreyer, Jason Smith, Jason Eisner
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where EMNLP
Authors Markus Dreyer, Jason Smith, Jason Eisner
Comments (0)