Sciweavers

NCA
2007
IEEE

A data reduction approach for resolving the imbalanced data issue in functional genomics

13 years 11 months ago
A data reduction approach for resolving the imbalanced data issue in functional genomics
Learning from imbalanced data occurs frequently in many machine learning applications. One positive example to thousands of negative instances is common in scientific applications. Unfortunately, traditional machine learning techniques often treat rare instances as noise. One popular approach for this difficulty is to resample the training data. However, this results in high false positive predictions. Hence, we propose preprocessing training data by partitioning them into clusters. This greatly reduces the imbalance between minority and majority instances in each cluster. For moderate imbalance ratio, our technique gives better prediction accuracy than other resampling method. For extreme imbalance ratio, this technique serves as a good filter that reduces the amount of imbalance so that traditional classification techniques can be deployed. More importantly, we have successfully applied our techniques to splice site prediction and protein subcellular localization problem, with si...
Kihoon Yoon, Stephen Kwek
Added 27 Dec 2010
Updated 27 Dec 2010
Type Journal
Year 2007
Where NCA
Authors Kihoon Yoon, Stephen Kwek
Comments (0)