Page segmentation into text and non-text components is an essential preprocessing step before OCR operation. If this is not done properly, an OCR classification engine produces g...
Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breu...
With an aim to high-level understanding of the mathematical contents in a document image the requirement of math-zone extraction and recognition technique is obvious. In this pape...
S. P. Chowdhury, S. Mandal, Amit Kumar Das, Bhabat...
In an XML document a considerable fraction consists of markup, that is, begin and end-element tags describing the document’s tree structure. XML compression tools such as XMill ...
Page segmentation algorithms found in published literatures often rely on some predetermined parameters such as general font sizes, distances between text lines and document scan ...