We examine correlations between native speaker judgements on automatically generated German text against automatic evaluation metrics. We look at a number of metrics from the MT a...
A number of approaches to Automatic MT Evaluation based on deep linguistic knowledge have been suggested. However, n-gram based metrics are still today the dominant approach. The ...
Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We beli...
This paper presents METEOR-NEXT, an extended version of the METEOR metric designed to have high correlation with postediting measures of machine translation quality. We describe c...
Automatic evaluation of Machine Translation (MT) quality is essential to developing highquality MT systems. Various evaluation metrics have been proposed, and BLEU is now used as ...
Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhit...