A key problem faced by classifiers is coping with styles not represented in the training set. We present an application of hierarchical Bayesian methods to the problem of recognizing degraded printed characters in a variety of fonts. The proposed method works by using training data of various styles and classes to compute prior distributions on the parameters for the class conditional distributions. For classification, the parameters for the actual class conditional distributions are fitted using an EM algorithm. The advantage of hierarchical Bayesian methods is motivated with a theoretical example. Severalfold increases in classification performance relative to style-oblivious and style-conscious are demonstrated on a multifont OCR task.
Charles Mathis, Thomas M. Breuel