Sciweavers

ECML
1997
Springer

Global Data Analysis and the Fragmentation Problem in Decision Tree Induction

14 years 4 months ago
Global Data Analysis and the Fragmentation Problem in Decision Tree Induction
We investigate an inherent limitation of top-down decision tree induction in which the continuous partitioning of the instance space progressively lessens the statistical support of every partial (i.e. disjunctive) hypothesis, known as the fragmentation problem. We show, both theoretically and empirically, how the fragmentation problem adversely a ects predictive accuracy as variation r (a measure of concept di culty) increases. Applying feature-construction techniques at every tree node, which we implement on a decision tree inducer DALI, is proved to only partially solve the fragmentation problem. Our study illustrates how a more robust solution must also assess the value of each partial hypothesis by recurring to all available training data, an approach we name global data analysis, which decision tree induction alone is unable to accomplish. The value of global data analysis is evaluated by comparing modi ed versions of C4.5rules with C4.5trees and DALI, on both arti cial and real-...
Ricardo Vilalta, Gunnar Blix, Larry A. Rendell
Added 07 Aug 2010
Updated 07 Aug 2010
Type Conference
Year 1997
Where ECML
Authors Ricardo Vilalta, Gunnar Blix, Larry A. Rendell
Comments (0)