For a very long time, it has been considered that the only way of automatically extracting similar groups of words from a text collection for which no semantic information exists ...
A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. G...
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we h...
Generation of ground-truths is of great importance for unbiased performance evaluation of document layout analysis methods. This is especially necessary because many methods are c...