In order to preserve our cultural heritage and for automated document processing libraries and national archives have started digitizing historical documents. In the case of degra...
Florian Kleber, Robert Sablatnig, Melanie Gau, Hei...
A crucial preprocessing stage in applications such as OCR is text extraction from mixed-type documents. The present work, in contrast to most until now, successfully faces the pro...
Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document core...
Jian Huang 0002, Sarah M. Taylor, Jonathan L. Smit...
In this paper, an efficient and computationally fast method for segmenting text and graphics part of document images based on textural cues is presented. We assume that the graphic...
We present a paradigm for uniting the diverse strands of XML-based Web technologies by allowing them to be incorporated within a single document. This overcomes the distinction be...