Building Useful Models from Imbalanced Data with Sampling and Boosting

14 years 3 months ago

Download www.aaai.org

Building useful classification models can be a challenging endeavor, especially when training data is imbalanced. Class imbalance presents a problem when traditional classification algorithms are applied. These algorithms often attempt to build models with the goal of maximizing overall classification accuracy. While such a model may be very accurate, it is often not very useful. Consider the domain of software quality prediction where the goal is to identify program modules that are most likely to contain faults. Since these modules make up only a small fraction of the entire project, a highly accurate model may be generated by classifying all examples as not fault prone. Such a model would be useless. To alleviate the problems associated with class imbalance, several techniques have been proposed. We examine two such techniques: data sampling and boosting. Five data sampling techniques and one commonly used boosting algorithm are applied to five datasets from the software quality pr...

Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van H

Real-time Traffic

Artificial Intelligence | Class Imbalance | FLAIRS 2008 | Software Quality Prediction | Training Data |

claim paper

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2008
Where	FLAIRS
Authors	Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, Amri Napolitano

Comments (0)

Sciweavers

Building Useful Models from Imbalanced Data with Sampling and Boosting

Artificial Intelligence | Class Imbalance | FLAIRS 2008 | Software Quality Prediction | Training Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers