Sciweavers

ACL
2008

Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora

14 years 1 months ago
Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora
Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviations is much more complex. Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated. In this paper, we present a novel unsupervised method that automatically extracts the relation between a full-form phrase and its abbreviation from monolingual corpora, and induces translation entries for the abbreviation by using its full-form as a bridge. Our method does not require any additional annotated data other than the data that a regular translation system uses. We integrate our method into a state-ofthe-art baseline translation system and show that it consistently improves the performance of the baseline system on various NIST MT test sets.
Zhifei Li, David Yarowsky
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ACL
Authors Zhifei Li, David Yarowsky
Comments (0)