We present a complete online handwritten character recognition system for Indian languages that handles the ambiguities in segmentation as well as recognition of the strokes. The recognition is based on a generative model of handwriting formation, coupled with a discriminative model for classification of strokes. Such an approach can seamlessly integrate language and script information in the generative model and deal with similar strokes using the discriminative stroke classification model. The recognition is performed in a purely bottomup fashion, starting with the strokes, and the ambiguities at each stage are preserved and transferred to the next stage for obtaining the most probable results at each stage. We also present the results of various preprocessing, feature selection and classification studies on a large data set collected from native language writers in two different Indian languages: Malayalam and Telugu. The system achieves a stroke level accuracy of 95.78% and 95.12%...
Amit Arora, Anoop M. Namboodiri