We propose a two-state Markov chain model of degraded document images. The model generates random and burst noise to simulate isolated pixel reversal as well as blurring of a larg...
This paper describes a novel approach to named entity (NE) tagging on degraded documents. NE tagging is the process of identifying salient text strings in unstructured text, corre...
In this paper, we propose a novel segmentation-free approach for keyword search in historical typewritten documents combining image preprocessing, synthetic data creation, word sp...
Basilios Gatos, Thomas Konidaris, Kostas Ntzios, I...
Current approaches to script identification rely on hand-selected features and often require processing a significant part of the document to achieve reliable identification. We p...
Word segmentation is a crucial step for segmentation-free document analysis systems and is used for creating an index based on word matching. In this paper, we propose a novel met...