Sciweavers

DATAMINE
1999

A Fast Parallel Clustering Algorithm for Large Spatial Databases

13 years 11 months ago
A Fast Parallel Clustering Algorithm for Large Spatial Databases
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the `shared-nothing' architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.
Xiaowei Xu, Jochen Jäger, Hans-Peter Kriegel
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 1999
Where DATAMINE
Authors Xiaowei Xu, Jochen Jäger, Hans-Peter Kriegel
Comments (0)