Sciweavers

23
Voted
INLG
2010
Springer
13 years 9 months ago
Comparing Rating Scales and Preference Judgements in Language Evaluation
Rating-scale evaluations are common in NLP, but are problematic for a range of reasons, e.g. they can be unintuitive for evaluators, inter-evaluator agreement and self-consistency...
Anja Belz, Eric Kow