We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple in...
We investigate the optimal LM treatment of abundant filled pauses (FP) in spontaneous monologues of a professional dictation task. Questions addressed here are (1) how to deal wi...
This paper presents an unsupervised method for discriminating among the senses of a given target word based on the context in which it occurs. Instances of a word that occur in si...
Manually constructing an inventory of word senses has suffered from problems including high cost, arbitrary assignment of meaning to words, and mismatch to domains. To overcome th...
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing f...