Performance evaluation of document recognition systems is a difficult and practically important problem. Issues arise in defining requirements, in characterizing the system's range of inputs and outputs, in interpreting published performance evaluation results, in reproducing performance evaluation experiments, in choosing training and test data, and in selecting performance metrics. We discuss these issues in the context of evaluating systems for recognition of mathematical expressions. Excellent progress has been made in the theory and practice of performance evaluation, but many open problems remain.