—We document methods for the quantitative evaluation of systems that produce a scalar summary of a biometric sample’s quality. We are motivated by a need to test claims that quality measures are predictive of matching performance. We regard a quality measurement algorithm as a black box that converts an input sample to an output scalar. We evaluate it by quantifying the association between those values and observed matching results. We advance detection error trade-off and error versus reject characteristics as metrics for the comparative evaluation of sample quality measurement algorithms. We proceed this with a definition of sample quality, a description of the operational use of quality measures. We emphasize the performance goal by including a procedure for annotating the samples of a reference corpus with quality values derived from empirical recognition scores.