Towards a Better Understanding of Random Forests through the Study of Strength and Correlation

14 years 5 months ago

Download hal.archives-ouvertes.fr

In this paper we present a study on the Random Forest (RF) family of ensemble methods. From our point of view, a "classical" RF induction process presents two main drawbacks : (i) the number of trees has to be a priori fixed (ii) trees are independently, thus arbitrarily, added to the ensemble due to the randomization principle. Hence, this kind of process offers no guarantee that all the trees will well cooperate into the same committee. In this work we thus propose to study the RF mechanisms that explain this cooperation by analysing, for particular subsets of trees called sub-forests, the link between accuracy and properties such as Strength and Correlation. We show that these properties, through the Correlation/Strengh2 ratio, should be taken into account to explain the sub-forest performance. Key words: Classification, Ensemble Method, Ensemble of Classifiers, Classifier Selection, Random Forests, Decision Trees

Simon Bernard, Laurent Heutte, Sébastien Ad

Real-time Traffic

Applied Computing | ICIC 2009 | Random Forests | Randomization Principle | RF Induction Process |

claim paper

Post Info
More Details (n/a)

Added	19 Feb 2011
Updated	19 Feb 2011
Type	Journal
Year	2009
Where	ICIC
Authors	Simon Bernard, Laurent Heutte, Sébastien Adam

Comments (0)

Sciweavers

Towards a Better Understanding of Random Forests through the Study of Strength and Correlation

Applied Computing | ICIC 2009 | Random Forests | Randomization Principle | RF Induction Process |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers