Sciweavers

SDM
2010
SIAM

A Robust Decision Tree Algorithm for Imbalanced Data Sets

14 years 29 days ago
A Robust Decision Tree Algorithm for Imbalanced Data Sets
We propose a new decision tree algorithm, Class Confidence Proportion Decision Tree (CCPDT), which is robust and insensitive to class distribution and generates rules which are statistically significant. In order to make decision trees robust, we begin by expressing Information Gain, the metric used in C4.5, in terms of confidence of a rule. This allows us to immediately explain why Information Gain, like confidence, results in rules which are biased towards the majority class. To overcome this bias, we introduce a new measure, Class Confidence Proportion (CCP), which forms the basis of CCPDT. To generate rules which are statistically significant we design a novel and efficient top-down and bottom-up approach which uses Fisher's exact test to prune branches of the tree which are not statistically significant. Together these two changes yield a classifier that performs statistically better than not only traditional decision trees but also trees learned from data that has been bala...
Wei Liu, Sanjay Chawla, David A. Cieslak, Nitesh V
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where SDM
Authors Wei Liu, Sanjay Chawla, David A. Cieslak, Nitesh V. Chawla
Comments (0)