There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image ...
With an aim to extract the structural information from the table of contents (TOC) to help develop digital document library the requirement of identifying/segmenting the TOC page ...
S. Mandal, S. P. Chowdhury, Amit Kumar Das, Bhabat...
Page segmentation into text and non-text components is an essential preprocessing step before OCR operation. If this is not done properly, an OCR classification engine produces g...
Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breu...
There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents a...
The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and impro...