As Chinese text is written without word boundaries, effectively recognizing Chinese words is like recognizing collocations in English, substituting characters for words and words ...
We present sparse topical coding (STC), a non-probabilistic formulation of topic models for discovering latent representations of large collections of data. Unlike probabilistic t...
This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristi...
In this paper, we propose a PLSA-based language model for sports live speech. This model is implemented in unigram rescaling technique that combines a topic model and an n-gram. I...
Statistical language models can learn relationships between topics discussed in a document collection and persons, organizations and places mentioned in each document. We present a...
David Newman, Chaitanya Chemudugunta, Padhraic Smy...