Many applications require the clustering of large amounts of high-dimensional data. Most clustering algorithms, however, do not work e ectively and e ciently in highdimensional space, which is due to the so-called "curse of dimensionality". In addition, the high-dimensional data often contains a signi cant amount of noise which causes additional e ectiveness problems. In this paper, we review and compare the existing algorithms for clustering highdimensional data and show the impact of the curse of dimensionality on their e ectiveness and e ciency. The comparison reveals that condensation-based approaches such as BIRCH or STING are the most promising candidates for achieving the necessary e ciency, but it also shows that basically all condensation-based approaches have severe weaknesses with respect to their e ectiveness in highdimensional space. To overcome these problems, we develop a new clustering technique called OptiGrid which is based on constructing an optimal grid...
Alexander Hinneburg, Daniel A. Keim