This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents. The heuristics starts from an initial set of basic content elements an...
This paper proposes a multi-signature document identification method that works robustly with lowresolution documents captured from handheld devices. The proposed method is based ...
During the last decade national archives, libraries, museums and companies started to make their records, books and files electronically available. In order to allow efficient ac...
Andreas Stoffel, David Spretke, Henrik Kinnemann, ...
This paper presents the XML-based formats ALTO, TEI, METS used for Digital Libraries and their interest for data representation in a Document Image Analysis and Recognition (DIAR)...
Form document analysis is one of the most essential tasks in document analysis and recognition. One of the most fundamental and crucial tasks is the extraction of the reference li...