The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance). This paper presents a n...
This paper describes a novel approach to named entity (NE) tagging on degraded documents. NE tagging is the process of identifying salient text strings in unstructured text, corre...
This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available...
Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem," many categories of documents continue to break modern OCR software such as docu...
It is a well documented fact that, for human readers, familiar text is more legible than unfamiliar text. Current-generation computer vision systems also are able to exploit some ...