Evolutionary Training Set Selection to Optimize C4.5 in Imbalanced Problems

14 years 8 months ago

Download sci2s.ugr.es

Classification in imbalanced domains is a recent challenge in machine learning. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. This contribution proposes an undersampling procedure based on evolutionary algorithms to perform a training set selection for optimizing the models obtained by the C4.5 decision tree. The proposal has been compared with other under-sampling and over-sampling techniques and the results are very competitive in terms of accuracy, and the obtained models are more interpretable.

Salvador García, Francisco Herrera

Real-time Traffic

HIS 2008 | Imbalanced Classification | Imbalanced Domains | Information Technology | Representative Class |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	HIS
Authors	Salvador García, Francisco Herrera

Sciweavers

Evolutionary Training Set Selection to Optimize C4.5 in Imbalanced Problems

HIS 2008 | Imbalanced Classification | Imbalanced Domains | Information Technology | Representative Class |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers