Mostexisting decision tree systemsuse a greedyapproachto inducetrees -- locally optimalsplits are inducedat every node of the tree. Althoughthe greedy approachis suboptimal,it is believed to producereasonablygoodtrees. In the current work,weattempt to verify this belief. Wequantify the goodnessof greedy tree induction empirically, using the populardecision tree algorithms, C4.5 and CART.Weinduce decision trees on thousands of synthetic data sets and compare themto the correspondingoptimal trees, which in turn are foundusing a novel mapcoloring idea. We measurethe effect on greedy induction of variables such as the underlying concept complexity, training set size, noise and dimensionality. Ourexperiments show,amongother things, that the expectedclassification cost of a greedily inducedtree is consistently very close to that of the optimaltree.
Sreerama K. Murthy, Steven Salzberg