A great deal of recent research has focused on the challenging task of selecting differentially expressed genes from microarray data (`gene selection'). Numerous gene selection algorithms have been proposed in the literature, but it is often unclear exactly how these algorithms respond to conditions like small sample-sizes or differing variances. Choosing an appropriate algorithm can therefore be difficult in many cases. In this paper we propose a theoretical analysis of gene selection, in which the probability of successfully selecting relevant genes, using a given gene ranking function, is explicitly calculated in terms of population parameters. The theory developed is applicable to any ranking function which has a known sampling distribution, or one which can be approximated analytically. In contrast to empirical methods, the analysis can easily be used to examine the behaviour of gene selection algorithms under a wide variety of conditions, even when the numbers of genes invo...
Sach Mukherjee, Stephen J. Roberts