Word segmentation is the most critical pre-processing step for any handwritten document recognition/retrieval system. This paper describes an approach to separate a line of uncons...
Thispaper presents a text word extraction algorithm that takes a set of bounding boxes of glyphs and their associated text lines of a given document andpartitions the glyphs into ...
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...
We present a new method for blind document bleed through removal based on separate Markov Random Field (MRF) regularization for the recto and for the verso side, where separate pri...
We address the problem of publishing parliamentary proceedings in a digital sustainable manner. We give an extensive requirements analysis, and based on that propose a uniform XML...