Approaches to improving corpus quality for statistical machine translation

13 years 11 months ago

Download nlpr-web.ia.ac.cn

: The performance of a statistical machine translation (SMT) system heavily depends on the quantity and quality of the bilingual language resource. However, the pervious work mainly focuses on the quantity and tries to collect more bilingual data. In this paper, we aim to optimize the bilingual corpus to improve the performance of the translation system. We propose methods to process the bilingual language data by filtering noise and selecting more informative sentences from the training corpus and the development corpus. The experimental results show that we can obtain a competitive performance using less data compared with using all available data.

Peng Liu, Yu Zhou, Chengqing Zong

Real-time Traffic

Bilingual Language | Bilingual Language Resource | ICMLC 2010 | Machine Learning | Statistical Machine Translation |

claim paper

Post Info
More Details (n/a)

Added	26 Jan 2011
Updated	26 Jan 2011
Type	Journal
Year	2010
Where	ICMLC
Authors	Peng Liu, Yu Zhou, Chengqing Zong

Comments (0)

Sciweavers

Approaches to improving corpus quality for statistical machine translation

Bilingual Language | Bilingual Language Resource | ICMLC 2010 | Machine Learning | Statistical Machine Translation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers