Generative Oversampling for Mining Imbalanced Datasets

14 years 8 months ago

Download www.ideal.ece.utexas.edu

— One way to handle data mining problems where class prior probabilities and/or misclassiﬁcation costs between classes are highly unequal is to resample the data until a new, desired class distribution in the training data is achieved. Many resampling techniques have been proposed in the past, and the relationship between resampling and cost-sensitive learning has been well studied. Surprisingly, however, few resampling techniques attempt to create new, artiﬁcial data points which generalize the known, labeled data. In this paper, we introduce an easily implementable resampling technique (generative oversampling) which creates new data points by learning from available training data. Empirically, we demonstrate that generative oversampling outperforms other wellknown resampling methods on several datasets in the example domain of text classiﬁcation.

Alexander Liu, Joydeep Ghosh, Cheryl Martin

Real-time Traffic

Data Mining | Data Points | DMIN 2007 | Many Resampling Techniques | Resampling |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	DMIN
Authors	Alexander Liu, Joydeep Ghosh, Cheryl Martin

Comments (0)

Sciweavers

Generative Oversampling for Mining Imbalanced Datasets

Data Mining | Data Points | DMIN 2007 | Many Resampling Techniques | Resampling |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers