In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...
—Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online da...
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of impo...
Map-reduce framework has received a significant attention and is being used for programming both large-scale clusters and multi-core systems. While the high productivity aspect of ...
We describe an approach to unsupervised high-accuracy recognition of the textual contents of an entire book using fully automatic mutual-entropy-based model adaptation. Given imag...