We analyze critically the use of classi cation accuracy to compare classi ers on natural data sets, providing a thorough investigation using ROC analysis, standard machine learning algorithms, and standard benchmark data sets. The results raise serious concerns about the use of accuracyfor comparing classi ers and draw into question the conclusions that can be drawn from such studies. In the course of the presentation, we describe and demonstrate what we believe to be the proper use of ROC analysis for comparative studies in machine learning research. We argue that this methodology is preferable both for making practical choices and for drawing scienti c conclusions.
Foster J. Provost, Tom Fawcett, Ron Kohavi