We propose and motivate an alternative to the traditional error-based or cost-based evaluation metrics for the goodness of speaker detection performance. The metric that we propose is an information-theoretic one, which measures the effective amount of information that the speaker detector delivers to the user. We show that this metric is appropriate for the evaluation of what we call application-independent detectors, which output soft decisions in the form of loglikelihood-ratios, rather than hard decisions. The proposed metric is constructed via analysis and generalization of cost-based evaluation metrics. This construction forms an interpretation of this metric as an expected cost, or as a total error-rate, over a range of different application-types. We further show how the metric can be decomposed into a discrimination and a calibration component. We conclude with an experimental demonstration of the proposed technique to evaluate three speaker detection systems submitted to the...
Niko Brümmer, Johan A. du Preez