Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

163

CASDMKM
2004
Springer

147views Data Mining» more CASDMKM 2004»

Data Set Balancing

16 years 1 days ago

Data Set Balancing

Download ait.unl.edu

This paper conducts experiments with three skewed data sets, seeking to demonstrate problems when skewed data is used, and identifying counter problems when data is balanced. The basic data mining algorithms of decision tree, regression-based, and neural network models are considered, using both categorical and continuous data. Two of the data sets have binary outcomes, while the third has a set of four possible outcomes. Key findings are that when the data is highly unbalanced, algorithms tend to degenerate by assigning all cases to the most common out come. When data is balanced, accuracy rates tend to decline. If data is balanced, that reduces the training set size, and can lead to the degeneracy of model failure through omission of cases encountered in the test set. Decision tree algorithms were found to be the most robust with respect to the degree of balancing applied.

David L. Olson

Real-time Traffic

CASDMKM 2004 | Data Mining | Data Mining Algorithms | Data Sets | Decision Tree |

claim paper

Related Content

» Improving Rule Induction Precision for Automated Annotation by Balancing Skewed Data Sets

» Parallel Multiresolution Volume Rendering of Large Data Sets with ErrorGuided Load Balanci...

» Maintaining Spatial Data Sets in DistributedMemory Machines

» Balanced binary trees for ID management and load balance in distributed hash tables

» The effect of imbalanced data sets on LDA A theoretical and empirical analysis

» Roughly Balanced Bagging for Imbalanced Data

» A DistanceBased OverSampling Method for Learning from Imbalanced Data Sets

» The Versioning System Balancing Data Amount and Access Frequency on Distributed Storage Sy...

» Hybrid Kernel Machine Ensemble for Imbalanced Data Sets

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	CASDMKM
Authors	David L. Olson

Comments (0)