Robust Machine Translation Evaluation with Entailment Features

15 years 3 months ago

Download www.nlpado.de

Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and argument structure overlap. We compare this metric against a combination metric of four state-of-theart scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric outperforms the individual scores, but is bested by the entailment-based metric. Combining the entailment and traditional features yields further improvements.

Sebastian Padó, Michel Galley, Daniel Juraf

Real-time Traffic

ACL 2009 | Computational Linguistics | Human Quality Judgments | Lack Crucial Robustness | Metric |

claim paper

» Building a Textual Entailment Suite for the Evaluation of Automatic Content Scoring Techno...

» DLSITE1 Lexical Analysis for Solving Textual Entailment Recognition

» SourceLanguage Features and Maximum Correlation Training for Machine Translation Evaluatio...

» Support Vector Methods for Sentence Level Machine Translation Evaluation

» Semantic Role Features for Machine Translation

» A Smorgasbord of Features for Statistical Machine Translation

» Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation

» Combination of Arabic Preprocessing Schemes for Statistical Machine Translation

Post Info
More Details (n/a)

Added	16 Feb 2011
Updated	16 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Sebastian Padó, Michel Galley, Daniel Jurafsky, Christopher D. Manning

Comments (0)

Sciweavers

Robust Machine Translation Evaluation with Entailment Features

ACL 2009 | Computational Linguistics | Human Quality Judgments | Lack Crucial Robustness | Metric |

Explore & Download

Productivity Tools

Sciweavers