In previous work, we have proposed a novel approach to data clustering based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [4, 5, 6]. In a comparison to alternative clustering techniques, the approach showed a high performance in terms of its capability to deal with a range of difficult data properties, including overlapping clusters, elongated cluster shapes and unequally sized clusters. In this paper, we make three modifications to the algorithm that improve its scalability to large data sets with high dimensionality and large numbers of clusters. Specifically, we introduce new initialization and mutation schemes that enable a more efficient exploration of the search space, and modify the null data model that is used as a basis for selecting the most significant solution from the Pareto front. The high performance of the resulting algorithm is demonstrated on a newly developed clustering test suite.
Julia Handl, Joshua D. Knowles