We describe experimental results for unsupervised recognition of the textual contents of book-images using fully automatic mutual-entropy-based model adaptation. Each experiment starts with approximate iconic and linguistic models--derived from (generally errorful) OCR results and (generally incomplete) dictionaries--and then runs a fully automatic adaptation algorithm which, guided entirely by evidence internal to the test set, attempts to correct the models for improved accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence probabilities. Our adaptation algorithm detects disagreements between the models by analyzing mutual entropy between (1) the a posteriori probability distribution of character classes (the recognition results from image classification alone), and (2) the a posteriori probability distribution of word classes (the recognition results from image classification c...
Pingping Xiu, Henry S. Baird