Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation

15 years 8 months ago

Download research.microsoft.com

Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation model which uses both monolingual and bilingual information to derive a segmentation suitable for MT. Experiments show that our method improves a state-ofthe-art MT system in a small and a large data environment.

Jia Xu, Jianfeng Gao, Kristina Toutanova, Hermann

Real-time Traffic

Chinese Word | COLING 2008 | Computational Linguistics | Such Word Segmentation | Word Segmentation |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	COLING
Authors	Jia Xu, Jianfeng Gao, Kristina Toutanova, Hermann Ney

Sciweavers

Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation

Chinese Word | COLING 2008 | Computational Linguistics | Such Word Segmentation | Word Segmentation |

Explore & Download

Productivity Tools

Sciweavers