Motivation: In cluster analysis, the validity of specific solutions, algorithms, and procedures present significant challenges because there is no null hypothesis to test and no 'right answer'. It has been noted that a replicable classification is not necessarily a useful one, but a useful one that characterizes some aspect of the population must be replicable. By replicable we mean reproducible across multiple samplings from the same population. Methodologists have suggested that the validity of clustering methods should be based on classifications that yield reproducible findings beyond chance levels. We used this approach to determine the performance of commonly used clustering algorithms and the degree of replicability achieved using several microarray datasets. Methods: We considered four commonly used iterative partitioning algorithms (Self Organizing Maps (SOM), K-means, Clutsering LARge Applications (CLARA), and Fuzzy C-means) and evaluated their performances on 37 m...
Nikhil R. Garge, Grier P. Page, Alan P. Sprague, B