Automatically countering imbalance and its empirical relationship to cost

15 years 13 days ago

Download www.csee.usf.edu

Learning from imbalanced datasets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively rarely such as in cases of fraud, instances of disease, and regions of interest in largescale simulations, there is a correspondingly high cost for the misclassification of rare events. Under such circumstances, the data set is often re-sampled to generate models with high minority class accuracy. However, the sampling methods face a common, but important, criticism: how to automatically discover the amount and type of sampling? To address this problem, we propose a wrapper paradigm that discovers the amount of resampling for a data set based on optimizing evaluation functions like the f-measure, Area Under the ROC Curve (AUROC), cost, cost-curves, and cost dependent f-measure. Our analysis of the wrapper is two-fold. First, we report the interaction between different evaluation and wrapper optimization func...

Nitesh V. Chawla, David A. Cieslak, Lawrence O. Ha

Real-time Traffic

Cost-sensitive Classifiers | Cost-sensitive Environment | DATAMINE 2008 | Minority Class Accuracy |

claim paper

Post Info
More Details (n/a)

Added	10 Dec 2010
Updated	10 Dec 2010
Type	Journal
Year	2008
Where	DATAMINE
Authors	Nitesh V. Chawla, David A. Cieslak, Lawrence O. Hall, Ajay Joshi

Comments (0)

Sciweavers

Automatically countering imbalance and its empirical relationship to cost

Cost-sensitive Classifiers | Cost-sensitive Environment | DATAMINE 2008 | Minority Class Accuracy |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers