Context-Sensitive Error Correction: Using Topic Models to Improve OCR

16 years 27 days ago

Download www.broadinstitute.org

Modern optical character recognition software relies on human interaction to correct misrecognized characters. Even though the software often reliably identiﬁes low-conﬁdence output, the simple language and vocabulary models employed are insufﬁcient to automatically correct mistakes. This paper demonstrates that topic models, which automatically detect and represent an article’s semantic context, reduces error by 7% over a global word distribution in a simulated OCR correction task. Detecting and leveraging context in this manner is an important step towards improving OCR.

Michael L. Wick, Michael G. Ross, Erik G. Learned-

Real-time Traffic

Article’s Semantic Context | Document Analysis | ICDAR 2007 | Optical Character Recognition Software | Simulated Ocr Correction |

claim paper

» Improving StateoftheArt OCR through HighPrecision DocumentSpecific Modeling

» Scaling Up WholeBook Recognition

» Identifying Modeling Errors in Signatures by Model Checking

» Incorporating Linguistic Model Adaptation into WholeBook Recognition

» Analysis of wholebook recognition

» Towards WholeBook Recognition

» Semisupervised learning of semantic classes for query understanding from the web and for t...

» Automatically generating related queries in Japanese

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	ICDAR
Authors	Michael L. Wick, Michael G. Ross, Erik G. Learned-Miller

Comments (0)

Sciweavers

Context-Sensitive Error Correction: Using Topic Models to Improve OCR

Article’s Semantic Context | Document Analysis | ICDAR 2007 | Optical Character Recognition Software | Simulated Ocr Correction |

Explore & Download

Productivity Tools

Sciweavers