Abstract. This paper presents an architecture that enables the recognizer to learn incrementally and, thereby adapt to document image collections for performance improvement. We argue that the recognition scheme for a book could be considerably different from that designed for isolated pages. We employ learning procedures to capture the relevant information available online, and feed it back to update the knowledge of the system. Experimental results show the effectiveness of our design for improving the performance on-the-fly. 1 Adaptable OCR System The success of document image indexing and retrieval in the newly emerging digital libraries considerably depends on the availability of robust OCRs that can take care of the diversity in the document image collections. Performance of the state of the art OCRs are not very encouraging for these collections [1,2]. Recent study by Lin [3] shows that document recognition research is still in great need for better accuracy and reliability, ...
Million Meshesha, C. V. Jawahar