In this paper we describe two related approaches to estimating the sample sizes required to statistically compare the performance of two classifiers: acceptable failure rates (AFR...
Traditionally, machine learning algorithms have been evaluated in applications where assumptions can be reliably made about class priors and/or misclassification costs. In this pa...