In this paper, we present a methodology for off-line character recognition that mainly focuses on handling the difficult cases of historical fonts and styles. The proposed methodology relies on a new feature extraction technique based on recursive subdivisions of the image as well as on calculation of the centre of masses of each sub-image with sub-pixel accuracy. Feature extraction is followed by a hierarchical classification scheme based on the level of granularity of the feature extraction method. Pairs of classes with high values in the confusion matrix are merged at a certain level and higher level granularity features are employed for distinguishing them. Several historical documents were used in order to demonstrate the efficiency of the proposed technique.
Georgios Vamvakas, Basilios Gatos, Stavros J. Pera