Background: Recently, mass spectrometry data have been mined using a genetic algorithm to produce discriminatory models that distinguish healthy individuals from those with cancer. This algorithm is the basis for claims of 100% sensitivity and specificity in two related publicly available datasets. To date, no detailed attempts have been made to explore the properties of this genetic algorithm within proteomic applications. Here the algorithm's performance on these datasets is evaluated relative to other methods. Results: In reproducing the method, some modifications of the algorithm as it is described are necessary to get good performance. After modification, a cross-validation approach to model selection is used. The overall classification accuracy is comparable though not superior to other approaches considered. Also, some aspects of the process rely upon random sampling and thus for a fixed dataset the algorithm can produce many different models. This raises questions about h...
Neal O. Jeffries