Clustering accuracy of partitional clustering algorithm for categorical data primarily depends upon the choice of initial data points (modes) to instigate the clustering process. Traditionally initial modes are chosen randomly. As a consequence of that, the clustering results cannot be generated and repeated consistently. In this paper we present an approach to compute initial modes for K-mode clustering algorithm to cluster categorical data sets. Here, we utilize the idea of Evidence Accumulation for combining the results of multiple clusterings. Initially, n F − dimensional data is decomposed into a large number of compact clusters; the K-modes algorithm performs this decomposition, with several clusterings obtained by N random initializations of the Kmodes algorithm. The modes thus obtained from every run of random initializations are stored in a Mode-Pool, PN. The objective is to investigate the contribution of those data objects/patterns that are less vulnerable to the choice o...
Shehroz S. Khan, Shri Kant