Sciweavers

DAS
2008
Springer

A Complete Optical Character Recognition Methodology for Historical Documents

14 years 1 months ago
A Complete Optical Character Recognition Methodology for Historical Documents
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology consists of three steps: The first two steps refer to creating a database for training using a set of documents, while the third one refers to recognition of new document images. First, a pre-processing step that includes image binarization and enhancement takes place. At a second step a top down segmentation approach is used in order to detect text lines, words and characters. A clustering scheme is then adopted in order to group characters of similar shape. This is a semi-automatic procedure since the user is able to interact at any time in order to correct possible errors of clustering and assign an ASCII label. After this step, a database is created in order to be used for recognition. Finally, in the third step, for every new document image the above segmentation approach takes place while the recognition is...
Georgios Vamvakas, Basilios Gatos, Nikolaos Stamat
Added 19 Oct 2010
Updated 19 Oct 2010
Type Conference
Year 2008
Where DAS
Authors Georgios Vamvakas, Basilios Gatos, Nikolaos Stamatopoulos, Stavros J. Perantonis
Comments (0)