Devanagari script is a two dimensional composition of symbols. It is highly cumbersome to treat each composite character as a separate atomic symbol because such combinations are v...
In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured cla...
Abstract—In offline handwritten text recognition, the separation of touching characters remains a challenge due to the variability of touching structures. This paper proposes a ...
Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "fa...
We present a document understanding system in which the arrangement of lines of text and block separators within a document are modeled by stochastic context free grammars. A gram...
John C. Handley, Anoop M. Namboodiri, Richard Zani...