Developing better systems for document image analysis requires understanding errors, their sources, and their effects. The interactions between various processing steps are complex, with details that can be obscured by the statistical methods that are employed in many cases. In this paper, we describe tools we are building to help the user view and understand the results of common document analysis procedures. Unlike existing platforms for ground-truthing page images, our system also allows users to visualize the results of automated error analyses. Recognition errors can be corrected interactively, with the effort to do so recorded as a measure that is useful in performance evaluation. Beyond this functionality for exploring error behavior, we consider how such tools could be designed to improve the quality of collections of badly recognized documents incrementally as users interact with them on a regular basis. We conclude by discussing topics for future research. Categories and Sub...
Daniel P. Lopresti, George Nagy