Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora

15 years 8 months ago

Download www.aclweb.org

Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviations is much more complex. Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated. In this paper, we present a novel unsupervised method that automatically extracts the relation between a full-form phrase and its abbreviation from monolingual corpora, and induces translation entries for the abbreviation by using its full-form as a bridge. Our method does not require any additional annotated data other than the data that a regular translation system uses. We integrate our method into a state-ofthe-art baseline translation system and show that it consistently improves the performance of the baseline system on various NIST MT test sets.

Zhifei Li, David Yarowsky

Real-time Traffic

ACL 2008 | Chinese Abbreviations | Computational Linguistics | Modern Chinese Texts | Translation System |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	ACL
Authors	Zhifei Li, David Yarowsky

Sciweavers

Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora

ACL 2008 | Chinese Abbreviations | Computational Linguistics | Modern Chinese Texts | Translation System |

Explore & Download

Productivity Tools

Sciweavers