Sciweavers

AMTA
2004
Springer

The Significance of Recall in Automatic Metrics for MT Evaluation

14 years 4 months ago
The Significance of Recall in Automatic Metrics for MT Evaluation
Recent research has shown that a balanced harmonic mean (F1 measure) of unigram precision and recall outperforms the widely used BLEU and NIST metrics for Machine Translation evaluation in terms of correlation with human judgments of translation quality. We show that significantly better correlations can be achieved by placing more weight on recall than on precision. While this may seem unexpected, since BLEU and NIST focus on n-gram precision and disregard recall, our experiments show that correlation with human judgments is highest when almost all of the weight is assigned to recall. We also show that stemming is significantly beneficial not just to simpler unigram precision and recall based metrics, but also to BLEU and NIST.
Alon Lavie, Kenji Sagae, Shyamsundar Jayaraman
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where AMTA
Authors Alon Lavie, Kenji Sagae, Shyamsundar Jayaraman
Comments (0)