Sciweavers

DBISP2P
2008
Springer

Exploiting Distribution Skew for Scalable P2P Text Clustering

14 years 1 months ago
Exploiting Distribution Skew for Scalable P2P Text Clustering
K-Means clustering is widely used in information retrieval and data mining. Distributed K-Means variants have already been proposed, but none of the past algorithms scales to large numbers of nodes. In this work we describe a new P2P algorithm which significantly reduces the communication costs involved by exploiting distribution skew, naturally found in text and other datasets. The algorithm achieves high clustering quality and requires no synchronization between peers. An extensive evaluation with up to 100.000 peers shows the algorithm's effectiveness and scalability as well as its ability to cope with churn.
Odysseas Papapetrou, Wolf Siberski, Fabian Leitrit
Added 19 Oct 2010
Updated 19 Oct 2010
Type Conference
Year 2008
Where DBISP2P
Authors Odysseas Papapetrou, Wolf Siberski, Fabian Leitritz, Wolfgang Nejdl
Comments (0)