Sciweavers

LREC
2010

Using Comparable Corpora to Adapt a Translation Model to Domains

14 years 1 months ago
Using Comparable Corpora to Adapt a Translation Model to Domains
Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains to which SMT is applicable, we created a method for estimating translation pseudo-probabilities from bilingual comparable corpora. The essence of our method is to calculate pairwise correlations between the words associated with a source-language word, presently restricted to a noun, and its translations; word translation pseudo-probabilities are calculated based on the assumption that the more associated words a translation is correlated with, the higher its translation probability. We also describe a method we created for calculating noun-sequence translation pseudo-probabilities based on occurrence frequencies of noun sequences and constituent-word translation pseudo-probabilities. Then, we present a framework for merging the translation pseudo-probabilities estimated from in-domain comparable corpora wit...
Hiroyuki Kaji, Takashi Tsunakawa, Daisuke Okada
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Hiroyuki Kaji, Takashi Tsunakawa, Daisuke Okada
Comments (0)