Document recognition involves many kinds of hypotheses: segmentation hypotheses, classification hypotheses, spatial relationship hypotheses, and so on. Many recognition strategies generate valid hypotheses which are eventually rejected, but current evaluation methods consider only accepted hypotheses. As a result, we have no way to measure errors associated with rejecting valid hypotheses. We propose describing hypothesis generation in more detail, by collecting the complete set of generated hypotheses and computing the recall and precision of this set: we call these the ‘historical recall’ and ‘historical precision.’ Using table cell detection examples, we demonstrate how historical recall and precision along with the complete set of generated hypotheses assist in the evaluation, debugging, and design of recognition strategies.
Richard Zanibbi, Dorothea Blostein, James R. Cordy