Abstract. This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, an...
Abstract. Training and evaluation of techniques for handwriting recognition and retrieval is a challenge given that it is difficult to create large ground-truthed datasets. This is...
Abstract. Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching on...
Gopal Datt Joshi, Saurabh Garg, Jayanthi Sivaswamy
This paper presents a novel approach for designing a semi-automatic adaptive OCR for large document image collections in digital libraries. We describe an interactive system for co...
Sachin Rawat, K. S. Sesh Kumar, Million Meshesha, ...
This paper describes a Neural Network (NN) approach for logical document structure extraction. In this NN architecture, called Transparent Neural Network (TNN), the document struct...
Engineering diagnosis often involves analyzing complex records of system states printed to large, textual log files. Typically the logs are designed to accommodate the widest debug...
This work presents our first contribution to the discrimination of the medieval manuscript texts in order to assist the palaeographers to date the ancient manuscripts. Our method i...
Ikram Moalla, Frank Lebourgeois, Hubert Emptoz, Ad...
We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files...