In this article we describe an algorithm for feature selection and gene clustering from high dimensional gene expression data. The method is based on measuring similarity between features/genes whereby redundancy therein is removed. This does not need any search and therefore is fast. A novel feature similarity measure, called maximum information compression index, is used. The feature selection algorithm also obtains gene clusters in a multiscale fashion. The superiority of the algorithm, in terms of speed and performance, is established on a real life molecular cancer classification dataset.
D. Dutta Majumder, Pabitra Mitra