Sciweavers

ICPR
2010
IEEE

Incorporating Linguistic Model Adaptation into Whole-Book Recognition

13 years 11 months ago
Incorporating Linguistic Model Adaptation into Whole-Book Recognition
Abstract—Whole-book recognition is a document image analysis strategy that operates on the complete set of a book’s page images using automatic adaptation to improve accuracy. Our algorithm expects to be given approximate iconic and linguistic models—derived from (generally errorful) OCR results and (generally incomplete) dictionaries—and then, guided entirely by evidence internal to the test set, corrects the models yielding improved accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence probabilities. In previous work, we reported that adapting the iconic model alone (with a perfect linguistic model) was able to automatically reduce word error rate on a 180-page book by a large factor. In this paper, we propose an algorithm that adapts both the iconic model and the linguistic model alternately to improve both models on the fly. The linguistic model adaptation method, wh...
Pingping Xiu, Henry S. Baird
Added 07 Dec 2010
Updated 07 Dec 2010
Type Conference
Year 2010
Where ICPR
Authors Pingping Xiu, Henry S. Baird
Comments (0)