An Efficient Word Segmentation Technique for Historical and Degraded Machine-Printed Documents

16 years 27 days ago

Download users.iit.demokritos.gr

Word segmentation is a crucial step for segmentation-free document analysis systems and is used for creating an index based on word matching. In this paper, we propose a novel methodology for word segmentation in historical and degraded machineprinted documents. The proposed technique faces problems such as having text of different size, having text and non-text areas lying very near and having non-straight and warped text lines. It is based on: (i) a dynamic run length smoothing algorithm that helps grouping together homogeneous text regions, (ii) noise and punctuation marks removal as well as on obstacle detection in order to facilitate the segmentation process and (iv) a draft text line estimation procedure that guides the final word segmentation result. After testing on numerous historical and degraded machineprinted documents, it has turned out that our methodology performs better compared to current state-of-the-art word segmentation techniques for historical and degraded machin...

Michael Makridis, N. Nikolaou, Basilios Gatos

Real-time Traffic

Degraded Machineprinted Documents | Document Analysis | ICDAR 2007 | Word Segmentation | Word Segmentation Techniques |

claim paper

» A comprehensive evaluation methodology for noisy historical document recognition technique...

» Relative Rank Statistics for Dialog Analysis

» A fast divisive clustering algorithm using an improved discrete particle swarm optimizer

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	ICDAR
Authors	Michael Makridis, N. Nikolaou, Basilios Gatos

Comments (0)

Sciweavers

An Efficient Word Segmentation Technique for Historical and Degraded Machine-Printed Documents

Degraded Machineprinted Documents | Document Analysis | ICDAR 2007 | Word Segmentation | Word Segmentation Techniques |

Explore & Download

Productivity Tools

Sciweavers