Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification alg

15 years 2 months ago

Download www.biomedcentral.com

Background: Data generated using `omics' technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction Analysis for Microarrays, Random Forests and Support Vector Machines) in high-dimensionality data settings. We evaluate the effects of varying levels of signal-to-noise ratio in the dataset, imbalance in class distribution and choice of metric for quantifying performance of the classifier. To guide study design, we present a summary of the key characteristics of `omics' data profiled in several human or animal model experiments utilizing high-content mass spectrometry...

Yu Guo, Armin Graber, Robert N. McBurney, Raji Bal

Real-time Traffic

Biomedical Studies | BMCBI 2010 | Prediction Analysis | Random Forests |

claim paper

» CGHpower exploring sample size calculations for chromosomal copy number experiments

» Ovarian cancer classification based on dimensionality reduction for SELDITOF data

» GeneTrailExpress a webbased pipeline for the statistical evaluation of microarray experime...

» A boosting method for maximizing the partial area under the ROC curve

» A powerful method for detecting differentially expressed genes from GeneChip arrays that d...

» Stability of gene contributions and identification of outliers in multivariate analysis of...

» Performance evaluation of pattern classifiers for handwritten character recognition

» Texture retrieval based on a nonparametric measure for multivariate distributions

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2010
Where	BMCBI
Authors	Yu Guo, Armin Graber, Robert N. McBurney, Raji Balasubramanian

Comments (0)

Sciweavers

Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification alg

Biomedical Studies | BMCBI 2010 | Prediction Analysis | Random Forests |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers