Traditionally, the use of untranscribed speech has been restricted to unsupervised or semi-supervised training of acoustic models. Comparison of recognizers has required labeled data. In this paper we show how recognizers may be rank-ordered in terms of their performance using only a large quantity of untranscribed data, given a third “reference” recognizer. We develop statistical tests for comparing recognizers in this scenario. The accuracy of the reference system need not be known. Also, while the accuracy of the reference system affects the amount of data required, with enough data it only needs to perform better than chance. We show through detailed experiments that the rank ordering predicted from untranscribed data is indeed correct.