Sciweavers

DRR
2011

Improved document image segmentation algorithm using multiresolution morphology

12 years 11 months ago
Improved document image segmentation algorithm using multiresolution morphology
Page segmentation into text and non-text components is an essential preprocessing step before OCR operation. If this is not done properly, an OCR classification engine produces garbage text due to the presence of nontext components. This paper describes improvements to the text/image segmentation algorithm described by Bloomberg,1 which is also available in his open-source Leptonica library.2 The modifications result in significant improvements over Bloomberg’s algorithm on UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.
Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breu
Added 19 Dec 2011
Updated 19 Dec 2011
Type Journal
Year 2011
Where DRR
Authors Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breuel
Comments (0)