If two translation systems differ differ in performance on a test set, can we trust that this indicates a difference in true system quality? To answer this question, we describe b...
A number of approaches to Automatic MT Evaluation based on deep linguistic knowledge have been suggested. However, n-gram based metrics are still today the dominant approach. The ...
We propose three new features for MT evaluation: source-sentence constrained n-gram precision, source-sentence reordering metrics, and discriminative unigram precision, as well as...