High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to...
We present a new class of problems, called resource-bounded information gathering for correlation clustering. Our goal is to perform correlation clustering under circumstances in w...
Abstract. Nearest neighbor searching is a fundamental computational problem. A set of n data points is given in real d-dimensional space, and the problem is to preprocess these poi...
Improvements in microprocessor and networking performance have made networks of workstations a very attractive platform for high-end parallel and distributed computing. However, t...
We examine the learning-curve sampling method, an approach for applying machinelearning algorithms to large data sets. The approach is based on the observation that the computatio...