Glyph extraction from historic document images

15 years 7 months ago

Download www.arne.schuldt.info

This paper is about the reproduction of ancient texts with vectorised fonts. While for OCR only recognition rates count, a reproduction process does not necessarily require the recognition of characters. Our system aims at extracting all characters from printed historic documents without the employment of knowledge of language, font, or writing system. It searches for the best prototypes and creates a documentspecific font from these glyphs. To reach this goal, many common OCR preprocessing steps are no longer adequate. We describe the necessary changes of our system that deals particularly with documents typeset in Fraktur. On the one hand, algorithms are described that extract glyphs accurately for the purpose of precise reproduction. On the other hand, classification results of extracted Fraktur glyphs are presented for different shape descriptors. Categories and Subject Descriptors I.4.3 [Image Processing and Computer Vision]: Enhancement; I.5.3 [Pattern Recognition]: Clustering G...

Lothar Meyer-Lerbs, Arne Schuldt, Björn Gottf

Real-time Traffic