Despite their effectiveness for robust speech processing, missing data techniques are vulnerable to errors in the classification of the input speech signal’s time-frequency points. A direct method for the removal of these mask errors is through the top-down optimization of the estimated mask, however this requires a measure to evaluate the mask quality without a priori noise knowledge. In this paper we propose the normalized likelihood confidence as such a criterion for robust speaker recognition. In this approach the accuracy with which an estimated mask classifies time-frequency points as corrupt or reliable is related to its likelihood score confidence. This is based on the conceptual effect of binary mask errors on the model likelihood distributions produced by accumulated marginalization densities. Experimental results confirm a relationship between the normalized likelihood distance and the accuracy of the time-frequency mask produced by various estimation strategies.