Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

14 years 4 months ago

Download acl.ldc.upenn.edu

The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including wordaligned data during training. Incorporating wordlevel alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentencealigned data affects the expected performance gain.

Chris Callison-Burch, David Talbot, Miles Osborne

Real-time Traffic

ACL 2004 | ACL 2007 | Alignment Error Rate | BLEU Score | Models Reduces Alignment |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2004
Where	ACL
Authors	Chris Callison-Burch, David Talbot, Miles Osborne

Comments (0)

Sciweavers

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

ACL 2004 | ACL 2007 | Alignment Error Rate | BLEU Score | Models Reduces Alignment |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers