A search engine for historical manuscript images

15 years 12 months ago

Download ciir.cs.umass.edu

Many museum and library archives are digitizing their large collections of handwritten historical manuscripts to enable public access to them. These collections are only available in image formats and require expensive manual annotation work for access to them. Current handwriting recognizers have word error rates in excess of 50% and therefore cannot be used for such material. We describe two statistical models for retrieval in large collections of handwritten manuscripts given a text query. Both use a set of transcribed page images to learn a joint probability distribution between features computed from word images and their transcriptions. The models can then be used to retrieve unlabeled images of handwritten documents given a text query. We show experiments with a training set of 100 transcribed pages and a test set of 987 handwritten page images from the George Washington collection. Experiments show that the precision at 20 documents is about 0.4 to 0.5 depending on the model. ...

Toni M. Rath, R. Manmatha, Victor Lavrenko

Real-time Traffic

Handwritten Historical Manuscripts | Historical Manuscripts | Image | SIGIR 2004 |

claim paper

» Games of Inquiry for Collaborative Concept Structuring

» Accessing the content of Greek historical documents

» Finding Motifs in a Database of Shapes

» Boosted decision trees for word recognition in handwritten document retrieval

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	SIGIR
Authors	Toni M. Rath, R. Manmatha, Victor Lavrenko

Comments (0)

Sciweavers

A search engine for historical manuscript images

Handwritten Historical Manuscripts | Historical Manuscripts | Image | SIGIR 2004 |

Explore & Download

Productivity Tools

Sciweavers