Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a non-trivial problem. In this paper, we study various distance measures and their effect on different clustering techniques. In addition to the standard Euclidean distance, we use Bit-Vector based, Comparative Clustering based, Huffman code based and Dominance based distance measures. We cluster both synthetic datasets and one real life dataset using the above distance measures by employing k-means, matrix partitioning and dominance based clustering algorithms. We analyse the results of our study using a real life dataset of cricket and compare the accuracy of various techniques using synthetic datasets.
Ankita Vimal, Satyanarayana R. Valluri, Kamalakar