The paper describes a lexicon driven approach for word recognition on handwritten documents using Conditional Random Fields(CRFs). CRFs are discriminative models and do not make any assumptions about the underlying data and hence are known to be superior to Hidden Markov Models(HMMs) for sequence labeling problems. For word recognition, the document is first segmented into word images using an existing neural network based algorithm. Each word image is then over segmented into a number of small segments such that the combination of segments forms character images. Segment(s) is/are labeled as characters with probability evaluated from the CRF model. The total probability of a word image representing an entry from the lexicon is computed using a dynamic programming algorithm which evaluates the optimal combination of segments.
Shravya Shetty, Harish Srinivasan, Sargur N. Sriha