Improving SVM Classification on Imbalanced Data Sets in Distance Spaces

13 years 11 months ago

Download www.cis.temple.edu

Abstract--Imbalanced data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare event is higher than misclassifying the usual event. When the data is highly skewed toward the usual, it can be very difficult for a learning system to accurately detect the rare event. There have been many approaches in recent years for handling imbalanced data sets, from under-sampling the majority class to adding synthetic points to the minority class in feature space. Distances between time series are known to be non-Euclidean and nonmetric, since comparing time series requires warping in time. This fact makes it impossible to apply standard methods like SMOTE to insert synthetic data points in feature spaces. We present an innovative approach that augments the minority class by adding synthetic points in distance spaces. We then use Support Vector Machines for classification. Our experimental results on ...

Suzan Koknar-Tezel, Longin Jan Latecki

Real-time Traffic

Data Mining | ICDM 2009 | Rare Events | Synthetic Points | Time Series |

claim paper

Post Info
More Details (n/a)

Added	18 Feb 2011
Updated	18 Feb 2011
Type	Journal
Year	2009
Where	ICDM
Authors	Suzan Koknar-Tezel, Longin Jan Latecki

Comments (0)

Sciweavers

Improving SVM Classification on Imbalanced Data Sets in Distance Spaces

Data Mining | ICDM 2009 | Rare Events | Synthetic Points | Time Series |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers