In this paper, we present a novel graph-based method for extracting handwritten text lines in monochromatic Arabic document images. Our approach consists of two steps Coarse text line estimation using primary components which define the line and assignment of diacritic components which are more difficult to associate with a given line. We first estimate local orientation at each primary component to build a sparse similarity graph. We then, use a shortest path algorithm to compute similarities between non-neighboring components. From this graph, we obtain coarse text lines using two estimates obtained from Affinity propagation and Breadth-first search. In the second step, we assign secondary components to a text line. The proposed method is very fast and robust to non-uniform skew and character size variations, normally present in handwritten text lines. We evaluate our method using a pixel-matching criteria, and report 96% accuracy on a dataset of 125 Arabic document images. We also ...
Jayant Kumar, Wael Abd-Almageed, Le Kang, David S.