We define a cluster to be characterized by regions of high density separated by regions that are sparse. By observing the downward closure property of density, the search for interesting structure in a high dimensional space can be reduced to a search for structure in lower dimensional subspaces. We present a Hierarchical Projection Pursuit Clustering (HPPC) algorithm that repeatedly bi-partitions the dataset based on the discovered properties of interesting 1-dimensional projections. We describe a projection search procedure and a projection pursuit index function based on Cho, Haralick and Yi's improvement of the Kittler and Illingworth optimal threshold technique. The output of the algorithm is a decision tree whose nodes store a projection and threshold and whose leaves represent the clusters (classes). Experiments with various real and synthetic datasets show the effectiveness of the approach.
Alexei D. Miasnikov, Jayson E. Rome, Robert M. Har