Sciweavers

KDD
2004
ACM

Learning a complex metabolomic dataset using random forests and support vector machines

15 years 21 days ago
Learning a complex metabolomic dataset using random forests and support vector machines
Metabolomics is the omics science of biochemistry. The associated data include the quantitative measurements of all small molecule metabolites in a biological sample. These datasets provide a window into dynamic biochemical networks and conjointly with other omic data, genes and proteins, have great potential to unravel complex human diseases. The dataset used in this study has 63 individuals, normal and diseased, and the diseased are drug treated or not, so there are three classes. The goal is to classify these individuals using the observed metabolite levels for 317 measured metabolites. There are a number of statistical challenges: non-normal data, the number of samples is less than the number of metabolites; there are missing data and the fact that data are missing is informative (assay values below detection limits can point to a specific class); also, there are high correlations among the metabolites. We investigate support vector machines (SVM), and random forest (RF), for outl...
Young Truong, Xiaodong Lin, Chris Beecher
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2004
Where KDD
Authors Young Truong, Xiaodong Lin, Chris Beecher
Comments (0)