Sciweavers

FLAIRS
2008

Building Useful Models from Imbalanced Data with Sampling and Boosting

14 years 1 months ago
Building Useful Models from Imbalanced Data with Sampling and Boosting
Building useful classification models can be a challenging endeavor, especially when training data is imbalanced. Class imbalance presents a problem when traditional classification algorithms are applied. These algorithms often attempt to build models with the goal of maximizing overall classification accuracy. While such a model may be very accurate, it is often not very useful. Consider the domain of software quality prediction where the goal is to identify program modules that are most likely to contain faults. Since these modules make up only a small fraction of the entire project, a highly accurate model may be generated by classifying all examples as not fault prone. Such a model would be useless. To alleviate the problems associated with class imbalance, several techniques have been proposed. We examine two such techniques: data sampling and boosting. Five data sampling techniques and one commonly used boosting algorithm are applied to five datasets from the software quality pr...
Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van H
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2008
Where FLAIRS
Authors Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, Amri Napolitano
Comments (0)