Language and Translation Model Adaptation using Comparable Corpora

15 years 8 months ago

Download www.cs.umd.edu

Traditionally, statistical machine translation systems have relied on parallel bi-lingual data to train a translation model. While bi-lingual parallel data are expensive to generate, monolingual data are relatively common. Yet monolingual data have been under-utilized, having been used primarily for training a language model in the target language. This paper describes a novel method for utilizing monolingual target data to improve the performance of a statistical machine translation system on news stories. The method exploits the existence of comparable text--multiple texts in the target language that discuss the same or similar stories as found in the source language document. For every source document that is to be translated, a large monolingual data set in the target language is searched for documents that might be comparable to the source documents. These documents are then used to adapt the MT system to increase the probability of generating texts that resemble the comparable d...

Matthew G. Snover, Bonnie J. Dorr, Richard M. Schw

Real-time Traffic

EMNLP 2008 | Monolingual Data | Natural Language Processing | Statistical Machine Translation | Target Language |

claim paper

» Finding translations for lowfrequency words in comparable corpora

» Learning bilingual translations from comparable corpora to crosslanguage information retri...

» Using Comparable Corpora to Solve Problems Difficult for Human Translators

» Focused web crawling in the acquisition of comparable corpora

» Resampling auxiliary data for language model adaptation in machine translation for speech

» Inversion Transduction Grammar Constraints for Mining Parallel Sentences from QuasiCompara...

» Rare Word Translation Extraction from Aligned Comparable Documents

» Generalising Lexical Translation Strategies for MT Using Comparable Corpora

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	EMNLP
Authors	Matthew G. Snover, Bonnie J. Dorr, Richard M. Schwartz

Comments (0)

Sciweavers

Language and Translation Model Adaptation using Comparable Corpora

EMNLP 2008 | Monolingual Data | Natural Language Processing | Statistical Machine Translation | Target Language |

Explore & Download

Productivity Tools

Sciweavers