Exploiting Distribution Skew for Scalable P2P Text Clustering

15 years 8 months ago

Download www.l3s.de

K-Means clustering is widely used in information retrieval and data mining. Distributed K-Means variants have already been proposed, but none of the past algorithms scales to large numbers of nodes. In this work we describe a new P2P algorithm which significantly reduces the communication costs involved by exploiting distribution skew, naturally found in text and other datasets. The algorithm achieves high clustering quality and requires no synchronization between peers. An extensive evaluation with up to 100.000 peers shows the algorithm's effectiveness and scalability as well as its ability to cope with churn.

Odysseas Papapetrou, Wolf Siberski, Fabian Leitrit

Real-time Traffic

Database | DBISP2P 2008 | K-means Clustering | K-Means Variants | Past Algorithms Scales |

claim paper

» Cluster Computing on the Fly P2P Scheduling of Idle Cycles in the Internet

» A peertopeer architecture for massive multiplayer online games

» Leveraging a scalable row store to build a distributed text index

» SNAP Smallworld Network Analysis and Partitioning An opensource parallel graph framework f...

» GigaTensor scaling tensor analysis up by 100 times algorithms and discoveries

Post Info
More Details (n/a)

Added	19 Oct 2010
Updated	19 Oct 2010
Type	Conference
Year	2008
Where	DBISP2P
Authors	Odysseas Papapetrou, Wolf Siberski, Fabian Leitritz, Wolfgang Nejdl

Comments (0)

Sciweavers

Exploiting Distribution Skew for Scalable P2P Text Clustering

Database | DBISP2P 2008 | K-means Clustering | K-Means Variants | Past Algorithms Scales |

Explore & Download

Productivity Tools

Sciweavers