We present a series of experiments to demonstrate the validity of Relative Utility (RU) as a measure for evaluating extractive summarizers. RU is applicable in both singledocument and multi-document summarization, is extendable to arbitrary compression rates with no extra annotation effort, and takes into account both random system performance and interjudge agreement. Our results using the JHU summary corpus indicate that RU is a reasonable and often superior alternative to several common evaluation metrics. Categories and Subject Descriptors H.3.3 [Information search and retrieval]: Question Answering and Text Summarization General Terms Measurement, Performance, Experimentation Keywords Text Summarization, Evaluation, Relative Utility
Dragomir R. Radev, Daniel Tam