Background: Recent technological advances in mass spectrometry pose challenges in computational mathematics and statistics to process the mass spectral data into predictive models with clinical and biological significance. We discuss several classification-based approaches to finding protein biomarker candidates using protein profiles obtained via mass spectrometry, and we assess their statistical significance. Our overall goal is to implicate peaks that have a high likelihood of being biologically linked to a given disease state, and thus to narrow the search for biomarker candidates. Results: Thorough cross-validation studies and randomization tests are performed on a prostate cancer dataset with over 300 patients, obtained at the Eastern Virginia Medical School using SELDITOF mass spectrometry. We obtain average classification accuracies of 87% on a four-group classification problem using a two-stage linear SVM-based procedure and just 13 peaks, with other methods performing compar...
Michael Wagner, Dayanand N. Naik, Alex Pothen, Sri