We discuss problems in developing policies for ground truthing document images for pixel-accurate segmentation. First, we describe ground truthing policies that apply to four diff...
Named Entity Recognition (NER) is an important subtask of document processing such as Information Extraction. This paper describes a NER algorithm which uses a Multi-Layer Percept...
In this paper we explore the effectiveness of three clustering methods used to perform word image indexing. The three methods are: the Self-Organazing Map (SOM), the Growing Hiera...
As a result of well-publicized security concerns with direct recording electronic (DRE) voting, there is a growing call for systems that employ some form of paper artifact to prov...
Daniel P. Lopresti, George Nagy, Elisa H. Barney S...
Certain forms of mathematical expression are used more often than others in practice. A quantitative understanding of actual usage can provide additional information to improve th...
We report an improved methodology for training a sequence of classifiers for document image content extraction, that is, the location and segmentation of regions containing handwr...
Patent document images maintained by the U.S. patent database have a specific format, in which figures and text descriptions are separated into different sections. This makes it d...