In machine learning, decision trees are employed extensively in solving classification problems. In order to design a decision tree classifier two main phases are employed. The first phase is to grow the tree using a set of data, called training data, quite often to its maximum size. The second phase is to prune the tree. The pruning phase produces a smaller tree with better generalization (smaller error on unseen data). One of the most popular decision tree classifiers introduced in the literature is the C4.5 decision tree classifier. In this paper, we introduce an additional phase, called adjustment phase, interjected between the growing and pruning phases of the C4.5 decision tree classifier. The intent of this adjustment phase is to reduce the C4.5 error rate by making adjustments to the non-optimal splits created in the growing phase of the C4.5 classifier, thus eventually improving generalization (accuracy of the tree on unseen data). In most of the simulations conducted with th...
Jason R. Beck, Maria Garcia, Mingyu Zhong, Michael