Word segmentation is the most critical pre-processing step for any handwritten document recognition/retrieval system. This paper describes an approach to separate a line of unconstrained (written in a natural manner) handwritten text into words. When the writing style is unconstrained, recognition of individual components may be unreliable so they must be grouped together into word hypotheses, before recognition algorithms can be used. Our approach uses a set of both local and global features, which is motivated by the way that human beings perform this kind of task. In addition, in order to overcome the disadvantage of different distance measures, we propose an average distance computed using three different methods. The system is evaluated using an unconstrained handwriting database, which contains 50 pages (1026 line, 7562 words images) handwritten documents. The overall accuracy is 90.82%, which shows a better performance than a pervious method.
Chen Huang, Sargur N. Srihari