Five image segmentation algorithms are evaluated: mean shift, normalised cuts, efficient graph-based segmentation, hierarchical watershed, and waterfall. The evaluation is done us...
We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three crit...
—When comparing clustering results, any evaluation metric breaks down the available information to a single number. However, a lot of evaluation metrics are around, that are not ...
Elke Achtert, Sascha Goldhofer, Hans-Peter Kriegel...
Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better syste...
Commonly used coreference resolution evaluation metrics can only be applied to key mentions, i.e. already annotated mentions. We here propose two variants of the B3 and CEAF coref...