Conventional optical character recognition (OCR) systems operate on individual characters and words, and do not normally exploit document or collection context. We describe a Coll...
K. Pramod Sankar, C. V. Jawahar, Raghavan Manmatha
Hidden markov model (HMM) is frequently used for Pinyin-toChinese conversion. But it only captures the dependency with the preceding character. Higher order markov models can brin...
In a recent article, Nakhleh, Ringe and Warnow introduced perfect phylogenetic networks--a model of language evolution where languages do not evolve via clean speciation--and form...
Episodic knowledge is often stored in the form of textual narratives written in natural language. However, a large repository of such narratives will contain both repetitive and n...
Abstract. One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which...