This paper introduces a novel method for minimum number of gene (feature) selection for a classification problem based on gene expression data with an objective function to maximise the classification accuracy. The method uses a hybrid of Pearson correlation coefficient (PCC) and signal-to-noise ratio (SNR) methods combined with an evolving classification function (ECF). First, the correlation coefficients between genes in a set of thousands, is calculated. Genes, that are highly correlated across samples are considered either dependent or coregulated and form a group (a cluster). Signal-to-noise ratio (SNR) method is applied to rank the correlated genes in this group according to their discriminative power towards the classes. Genes with the highest SNR are used in a preliminary feature set as representatives of each group. An incremental algorithm that consists of selecting a minimum number of genes (variables) from the preliminary feature set, starting from one gene, is then applie...
Liang Goh, Qun Song, Nikola K. Kasabov