We are proposing a novel method that makes it possible to analyze high dimensional data with arbitrary shaped projected clusters and high noise levels. At the core of our method lies the idea of subspace validity. We map the data in a way that allows us to test the quality of subspaces using statistical tests. Experimental results, both on synthetic and real data sets, demonstrate the potential of our method.
Amihood Amir, Reuven Kashi, Nathan S. Netanyahu, D