A fuzzy c-means algorithm was adapted for analyzing microarray data. The adaptation consisted of initialization of fuzzy centroids using gene ontology information and the use of Pearson correlation distance in the objective function. To initialize fuzzy centroids, we classified genes based on gene ontology terms and used the classified genes as initial fuzzy clusters. Pearson correlation distance becomes 0 if two genes are either positively or negatively correlated. The algorithm was applied to Yeast and lung cancer microarray datasets. It outperformed the conventional fuzzy c-means algorithm by associating more genes to functional groups.
Mingrui Zhang, Terry M. Therneau, Michael A. McKen