Text extraction is an important phase in document recognition systems. In order to segment text from a page document it is necessary to detect all the possible manuscript text regions. In this article we propose an efficient algorithm to segment handwritten text lines. The text line algorithm uses a morphological operator to obtain the features of the images. Following, a sequence of histogram projection and recovery is proposed to obtain the line segmented region of the text. First, an Y histogram projection is performed which results in the text lines positions. To divide the lines in different regions a threshold is applied. After that, another threshold is used to eliminate false lines. These procedures, however, cause some loss on the text line area. So, a recovery method is proposed to minimize this effect. In order to detect the extreme positions of the text in the horizontal direction, an X histogram projection is applied. Then, as in the Y direction, another threshold is used...
Rodolfo P. dos Santos, Gabriela S. Clemente, Ing R