Predictive accuracy has been used as the main and often only evaluation criterion for the predictive performance of classification learning algorithms. In recent years, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been proposed as an alternative single-number measure for evaluating learning algorithms. In this paper, we prove that AUC is a better measure than accuracy. More specifically, we present rigourous definitions on consistency and discriminancy in comparing two evaluation measures for learning algorithms. We then present empirical evaluations and a formal proof to establish that AUC is indeed statisticallyconsistentand more discriminatingthan accuracy. Our result is quite significant since we formally prove that, for the first time, AUC is a better measure thanaccuracy in the evaluationof learning algorithms.
Charles X. Ling, Jin Huang, Harry Zhang