Sciweavers

DATAMINE
2008

Automatically countering imbalance and its empirical relationship to cost

13 years 11 months ago
Automatically countering imbalance and its empirical relationship to cost
Learning from imbalanced datasets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively rarely such as in cases of fraud, instances of disease, and regions of interest in largescale simulations, there is a correspondingly high cost for the misclassification of rare events. Under such circumstances, the data set is often re-sampled to generate models with high minority class accuracy. However, the sampling methods face a common, but important, criticism: how to automatically discover the amount and type of sampling? To address this problem, we propose a wrapper paradigm that discovers the amount of resampling for a data set based on optimizing evaluation functions like the f-measure, Area Under the ROC Curve (AUROC), cost, cost-curves, and cost dependent f-measure. Our analysis of the wrapper is two-fold. First, we report the interaction between different evaluation and wrapper optimization func...
Nitesh V. Chawla, David A. Cieslak, Lawrence O. Ha
Added 10 Dec 2010
Updated 10 Dec 2010
Type Journal
Year 2008
Where DATAMINE
Authors Nitesh V. Chawla, David A. Cieslak, Lawrence O. Hall, Ajay Joshi
Comments (0)