Sciweavers

FLAIRS
2007

A Distance-Based Over-Sampling Method for Learning from Imbalanced Data Sets

14 years 2 months ago
A Distance-Based Over-Sampling Method for Learning from Imbalanced Data Sets
Many real-world domains present the problem of imbalanced data sets, where examples of one classes significantly outnumber examples of other classes. This makes learning difficult, as learning algorithms based on optimizing accuracy over all training examples will tend to classify all examples as belonging to the majority class. We introduce a method to deal with this problem by means of creating a balanced data set, which allows to improve the performance of classifiers. Our method over-samples the minority class, using a randomized weighted distance scheme to generate synthetic examples in the neighborhood of each minority example.
Jorge de la Calleja, Olac Fuentes
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2007
Where FLAIRS
Authors Jorge de la Calleja, Olac Fuentes
Comments (0)