We describe a component of a document analysis system for constructing ontologies for domain-specific web tables imported into Excel. This component automates extraction of the Wa...
Sharad C. Seth, Ramana Chakradhar Jandhyala, Mukka...
We consider the problem of dealing with irrelevant votes when a multi-case classifier is built from an ensemble of binary classifiers. We show how run-off elections can be used to...
A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-d...
Most of the Indian scripts do not have any robust commercial OCRs. Many of the laboratory prototypes report reasonable results at recognition/classification stage. However, word ...
In this paper we tackle the problem of document image retrieval by combining a similarity measure between documents and the probability that a given document belongs to a certain ...
Albert Gordo, Jaume Gibert, Ernest Valveny, Mar&cc...
Matching word images has many applications in document recognition and retrieval systems. Dynamic Time Warping (DTW) is popularly used to estimate the similarity between word imag...
Conventional optical character recognition (OCR) systems operate on individual characters and words, and do not normally exploit document or collection context. We describe a Coll...
K. Pramod Sankar, C. V. Jawahar, Raghavan Manmatha
This paper addresses how to quickly recognize a character pattern using a lot of case examples without learning. Here without learning means just finding the most similar example...
Whole-book recognition is a document image analysis strategy that operates on the complete set of a book’s page images, attempting to improve accuracy by automatic unsupervised ...
In this paper we present a new database of online handwritten documents with different contents such as text, drawings, diagrams, formulas, tables, lists, and markings. It was de...