We present a framework for clustering distributed data in unsupervised and semi-supervised scenarios, taking into account privacy requirements and communication costs. Rather than...
We present a new L1-distance-based k-means clustering algorithm to address the challenge of clustering high-dimensional proportional vectors. The new algorithm explicitly incorpor...
Bonnie K. Ray, Hisashi Kashima, Jianying Hu, Monin...
Huge amounts of data are stored in autonomous, geographically distributed sources. The discovery of previously unknown, implicit and valuable knowledge is a key aspect of the expl...
This paper describes the realization of a parallel version of the k/h-means clustering algorithm. This is one of the basic algorithms used in a wide range of data mining tasks. We ...
In recent years, several frameworks have been developed for processing very large quantities of data on large clusters of commodity PCs. These frameworks have focused on fault-tole...