Sciweavers

EMNLP
2009

Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases

13 years 9 months ago
Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases
Untranslated words still constitute a major problem for Statistical Machine Translation (SMT), and current SMT systems are limited by the quantity of parallel training texts. Augmenting the training data with paraphrases generated by pivoting through other languages alleviates this problem, especially for the so-called "low density" languages. But pivoting requires additional parallel texts. We address this problem by deriving paraphrases monolingually, using distributional semantic similarity measures, thus providing access to larger training resources, such as comparable and unrelated monolingual corpora. We present what is to our knowledge the first successful integration of a collocational approach to untranslated words with an end-to-end, state of the art SMT system demonstrating significant translation improvements in a low-resource setting.
Yuval Marton, Chris Callison-Burch, Philip Resnik
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where EMNLP
Authors Yuval Marton, Chris Callison-Burch, Philip Resnik
Comments (0)