Sciweavers

COLING
2008

Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation

14 years 1 months ago
Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation model which uses both monolingual and bilingual information to derive a segmentation suitable for MT. Experiments show that our method improves a state-ofthe-art MT system in a small and a large data environment.
Jia Xu, Jianfeng Gao, Kristina Toutanova, Hermann
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where COLING
Authors Jia Xu, Jianfeng Gao, Kristina Toutanova, Hermann Ney
Comments (0)