Building Useful Models from Imbalanced Data with Sampling and Boosting

15 years 9 months ago

Download www.aaai.org

Building useful classification models can be a challenging endeavor, especially when training data is imbalanced. Class imbalance presents a problem when traditional classification algorithms are applied. These algorithms often attempt to build models with the goal of maximizing overall classification accuracy. While such a model may be very accurate, it is often not very useful. Consider the domain of software quality prediction where the goal is to identify program modules that are most likely to contain faults. Since these modules make up only a small fraction of the entire project, a highly accurate model may be generated by classifying all examples as not fault prone. Such a model would be useless. To alleviate the problems associated with class imbalance, several techniques have been proposed. We examine two such techniques: data sampling and boosting. Five data sampling techniques and one commonly used boosting algorithm are applied to five datasets from the software quality pr...

Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van H

Real-time Traffic

Artificial Intelligence | Class Imbalance | FLAIRS 2008 | Software Quality Prediction | Training Data |

claim paper

» Selecting Minority Examples from Misclassified Data for OverSampling

» Learning to improve areaunderFROC for imbalanced medical data classification using an ense...

» RAMOBoost ranked minority oversampling in boosting

» Learning from Heterogeneous Sources via Gradient Boosting Consensus

» Improving supervised learning for meeting summarization using sampling and regression

» Building Classification Models from Microarray Data with TreeBased Classification Algorith...

» Building Outline Extraction from Digital Elevation Models Using Marked Point Processes

» A Vector Space Model for Subjectivity Classification in Urdu aided by CoTraining

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2008
Where	FLAIRS
Authors	Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, Amri Napolitano

Comments (0)

Sciweavers

Building Useful Models from Imbalanced Data with Sampling and Boosting

Artificial Intelligence | Class Imbalance | FLAIRS 2008 | Software Quality Prediction | Training Data |

Explore & Download

Productivity Tools

Sciweavers