This paper deals with an Optical Character Recognition system for printed Urdu, a popular Indian script. The development of OCR for this script is difficult because (i) a large number of characters have to be recognized (ii) there are many similar shaped characters. In the proposed system individual characters are recognized using a combination of topological, contour and water reservoir concept based features. The feature detection methods are simple and robust. A prototype of the system has been tested on printed Urdu characters and currently achieves 97.8% character level accuracy on average.
U. Pal, Anirban Sarkar