This paper describes a system for handwritten Chinese text recognition integrating language model. On a text line image, the system generates character segmentation and word segme...
Word segmentation is a crucial step for segmentation-free document analysis systems and is used for creating an index based on word matching. In this paper, we propose a novel met...
We describe a segmentation method and associated file format for storing images of color documents. We separate each page of the document into three layers, containing the backgro...
Daniel P. Huttenlocher, Pedro F. Felzenszwalb, Wil...
The Arabic language has a very rich morphology where a word is composed of zero or more prefixes, a stem and zero or more suffixes. This makes Arabic data sparse compared to other...
The processing of Japanese text is complicated by the fact that there are no word delimiters. To segment Japanese text, systems typically use knowledge-based methods and large lex...