Background: Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results. Results: In the case when both predictor correlation was present and predictors were associated with the outcome (HA), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H0) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under HA and was unbiased under H0. Scaled VIMs were clearly biased under HA and H0. Conclusions: Unconditional unscaled VIMs are a computationally tractable choice for large data...
Kristin K. Nicodemus, James D. Malley, Carolin Str