Degraded documents are frequently obtained in various situations. Examples of degraded document collections include historical document depositories, document obtained in legal an...
Line segmentation is the first and the most critical pre-processing step for a document recognition/analysis task. Complex handwritten documents with lines running into each other...
Kamal Kuzhinjedathu, Harish Srinivasan, Sargur N. ...
We describe an approach to unsupervised high-accuracy recognition of the textual contents of an entire book using fully automatic mutual-entropy-based model adaptation. Given imag...
Word segmentation is the most critical pre-processing step for any handwritten document recognition/retrieval system. This paper describes an approach to separate a line of uncons...
Adaptive binarization is an important first step in many document analysis and OCR processes. This paper describes a fast adaptive binarization algorithm that yields the same qual...
There is a strong demand for developing automated tools for extracting pertinent information from the biomedical literature that is a rich, complex, and dramatically growing resou...
We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of reg...
In this paper, we revisit the problem of detecting the page numbers of a document. This work is motivated by a need for a generic method which applies on a large variety of docume...