Sciweavers

INLG
2010
Springer
15 years 1 days ago
Comparing Rating Scales and Preference Judgements in Language Evaluation
Rating-scale evaluations are common in NLP, but are problematic for a range of reasons, e.g. they can be unintuitive for evaluators, inter-evaluator agreement and self-consistency...
Anja Belz, Eric Kow