Sciweavers

COLING
2010

Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets

13 years 7 months ago
Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets
An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence fragments that potentially compete to be the translation of the same source-language fragment. Using this grammar, a set of impostor sentences is then created for each English sentence to simulate confusions that would arise if the system were to process an (unavailable) input whose correct English translation is that sentence. An LM is then trained to discriminate between the original sentences and the impostors. The procedure is applied to the IWSLT Chinese-to-English translation task, and promising improvements on a state-ofthe-art MT system are demonstrated. 1 Discriminative Language Modeling A language model (LM) constitutes a crucial component in many tasks such as machine translation...
Zhifei Li, Ziyuan Wang, Sanjeev Khudanpur, Jason E
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Zhifei Li, Ziyuan Wang, Sanjeev Khudanpur, Jason Eisner
Comments (0)