We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
Data mining allows the exploration of sequences of phenomena, whereas one usually tends to focus on isolated phenomena or on the relation between two phenomena. It offers invaluab...
In Latent Semantic Indexing (LSI), a collection of documents is often pre-processed to form a sparse term-document matrix, followed by a computation of a low-rank approximation to...
We exploit sketch techniques, especially the Count-Min sketch, a memory, and time efficient framework which approximates the frequency of a word pair in the corpus without explic...
With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search ...
Sofus A. Macskassy, Arunava Banerjee, Brian D. Dav...