The problem of character recognition in a book should be formulated significantly different from that of a single page or word. An ideal approach to design such a recognizer is to...
Tokenization is one of the initial steps done for almost any text processing task. It is not particularly recognized as a challenging task for English monolingual systems but it r...
In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold w...
Unlike English, where unfamiliar words can be queried for its meaning by typing out its letters, the analogous operation in Chinese is far from trivial due to the nature of its wr...
This paper presents a new method for segmenting and recognizing Chinese handwritten address character strings. First, a dissection algorithm is applied to over-segment string imag...