Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

177

ICDAR
2007
IEEE

163views Document Analysis» more ICDAR 2007»

Content-level Annotation of Large Collection of Printed Document Images

16 years 26 days ago

Content-level Annotation of Large Collection of Printed Document Images

Download cvit.iiit.ac.in

A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, especially when the annotation is at the character level. In this paper, we propose an efﬁcient hierarchical approach for annotation of large collection of printed document images. We align document images with independently keyed-in text. The method is model-driven and is intended to annotate large collection of documents, scanned in three different resolutions, at character level. We employ an XML representation for storage of the annotation information. APIs are provided for access at content level for easy use in training and evaluation of OCRs and other document understanding tasks.

Anand Kumar 0002, C. V. Jawahar

Real-time Traffic

Character Level | Document Analysis | Document Images | ICDAR 2007 | Large Collection |

claim paper

Related Content

» SignatureBased Document Image Retrieval

» Probabilistic Reverse Annotation for Large Scale Image Retrieval

» Retrieval from Document Image Collections

» A Generic Architecture for the Conversion of Document Collections into Semantically Annota...

» Efficient Word Retrieval by Means of SOM Clustering and PCA

» Devising Interactive Access Techniques for Indian Language Document Images

» ZooMICSS a zoomable map image collection sensemaking system the Katrina Rita context

» A General System for the Retrieval of Document Images from Digital Libraries

» Multiscale Structural Saliency for Signature Detection

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	ICDAR
Authors	Anand Kumar 0002, C. V. Jawahar

Comments (0)