Bias in random forest variable importance measures: Illustrations, sources and a solution

15 years 6 months ago

Download www.stat.uni-muenchen.de

Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classiﬁcation tasks in bioinformatics and related scientiﬁc ﬁelds, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale level or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of diﬀerent types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show diﬀerent numbers of categories. Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, t...

Carolin Strobl, Anne-Laure Boulesteix, Achim Zeile

Real-time Traffic

BMCBI 2007 | Random Forest | Variable Importance | Variable Selection |

claim paper

Added	12 Dec 2010
Updated	12 Dec 2010
Type	Journal
Year	2007
Where	BMCBI
Authors	Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, Torsten Hothorn

Sciweavers

Bias in random forest variable importance measures: Illustrations, sources and a solution

BMCBI 2007 | Random Forest | Variable Importance | Variable Selection |

Explore & Download

Productivity Tools

Sciweavers