A post-processing scheme for malayalam using statistical sub-character language models

15 years 10 months ago

Download cvit.iiit.ac.in

Most of the Indian scripts do not have any robust commercial OCRs. Many of the laboratory prototypes report reasonable results at recognition/classiﬁcation stage. However, word level accuracies are still poor. It is well known that word accuracy decreases as the number of characters in a word increase. For Malayalam, the average number of characters in a word is almost twice that of English. Moreover, the number of words required to cover 80% of the Malayalam language is more than forty times that of other Indian languages such as Hindi. Hence a direct dictionary based post-processing scheme is not suitable for Malayalam. In this paper, we propose a post-processing scheme which uses statistical language models at the sub-character level to boost word level recognition results. We use a multi-stage graph representation and formulate the recognition task as an optimization problem. Edges of the graph encode the language information and nodes represent the visual similarities. An optim...

Karthika Mohan, C. V. Jawahar

Real-time Traffic

DAS 2010 | Document Analysis | Post-processing Scheme | Word Level | Word Level Accuracies |

claim paper

Post Info
More Details (n/a)

Added	12 Aug 2010
Updated	12 Aug 2010
Type	Conference
Year	2010
Where	DAS
Authors	Karthika Mohan, C. V. Jawahar

Comments (0)

Sciweavers

A post-processing scheme for malayalam using statistical sub-character language models

DAS 2010 | Document Analysis | Post-processing Scheme | Word Level | Word Level Accuracies |

Explore & Download

Productivity Tools

Sciweavers