Sciweavers

ADBIS
2015
Springer

CLUS: Parallel Subspace Clustering Algorithm on Spark

8 years 8 months ago
CLUS: Parallel Subspace Clustering Algorithm on Spark
Subspace clustering techniques were proposed to discover hidden clusters that only exist in certain subsets of the full feature spaces. However, the time complexity of such algorithms is at most exponential with respect to the dimensionality of the dataset. In addition, datasets are generally too large to fit in a single machine under the current big data scenarios. The extremely high computational complexity, which results in poor scalability with respect to both size and dimensionality of these datasets, give us strong motivations to propose a parallelized subspace clustering algorithm able to handle large high dimensional data. To the best of our knowledge, there are no other parallel subspace clustering algorithms that run on top of new generation big data distributed platforms such as MapReduce and Spark. In this paper we introduce CLUS: a novel parallel solution of subspace clustering based on SUBCLU algorithm. CLUS uses a new dynamic data partitioning method specifically desig...
Bo Zhu, Alexandru Mara, Alberto Mozo
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ADBIS
Authors Bo Zhu, Alexandru Mara, Alberto Mozo
Comments (0)