We present a method for initialising the K-means clustering algorithm. Our method hinges on the use of a kd-tree to perform a density estimation of the data at various locations. We then use a modication of Katsavounidis' algorithm, which incorporates this density information, to choose K seeds for the K-means algorithm. We test our algorithm on 36 synthetic data sets and compare with 25 runs of Forgy's random initialisation method. Key words: Clustering, K-means algorithm, kd-tree, Initialisation, Density estimation PACS: code1, code2, code3
Stephen J. Redmond, Conor Heneghan