We describe a technique of linguistic post-processing of whole-book recognition results. Whole-book recognition is a technique that improves recognition of book images using fully automatic cross-entropy-based model adaptation. In previous published works, word recognition was performed on individual words separately, without awaring passage-level information such as word-occurrence frequencies. Therefore, some rare words in real texts may appear much more often in recognition results; vice versa. Differences between word frequencies in recognition results and in prior knowledge may indicate recognition errors on a long passage. In this paper, we propose a post-processing technique to enhance whole-book recognition results by minimizing differences between word frequencies in recognition results and prior word frequencies. This technique
Pingping Xiu, Henry S. Baird