Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?

15 years 8 months ago

Download storm.cis.fordham.edu

- The classifier built from a data set with a highly skewed class distribution generally predicts the more frequently occurring classes much more often than the infrequently occurring classes. This is largely due to the fact that most classifiers are designed to maximize accuracy. In many instances, such as for medical diagnosis, this classification behavior is unacceptable because the minority class is the class of primary interest (i.e., it has a much higher misclassification cost than the majority class). In this paper we compare three methods for dealing with data that has a skewed class distribution and nonuniform misclassification costs. The first method incorporates the misclassification costs into the learning algorithm while the other two methods employ oversampling or undersampling to make the training data more balanced. In this paper we empirically compare the effectiveness of these methods in order to determine which produces the best overall classifier—and under what ci...

Gary M. Weiss, Kate McCarthy, Bibi Zabar

Real-time Traffic

Data Mining | DMIN 2007 | Misclassification Costs | Nonuniform Misclassification Costs | Skewed Class Distribution |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	DMIN
Authors	Gary M. Weiss, Kate McCarthy, Bibi Zabar

Comments (0)

Sciweavers

Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?

Data Mining | DMIN 2007 | Misclassification Costs | Nonuniform Misclassification Costs | Skewed Class Distribution |

Explore & Download

Productivity Tools

Sciweavers