A segmentation algorithm, which can detect different regions of a handwritten document such as text lines, tables and sketches will be extremely useful in a variety of applications such as retrieval, translation and genre classification. However, this task is extremely challenging for handwritten documents, which vary considerably in their structure and content. In this paper, we describe a robust segmentation method to detect the regions in an unstructured on-line handwritten document. We utilize the temporal information in on-line documents along with its spatial layout to improve the segmentation results. The properties of handwritten strokes are computed using a spline-based representation. We compute the most likely segmentation of the handwritten page using a Stochastic Context Free Grammar based parser. The regions considered in this work include paragraphs, text lines, words, and non-text regions.
Anoop M. Namboodiri, Anil K. Jain