Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data

16 years 1 months ago

Download www.ist.temple.edu

The appropriate choice of a method for imputation of missing data becomes especially important when the fraction of missing values is large and the data are of mixed type. The proposed dynamic clustering imputation (DCI) algorithm relies on similarity information from shared neighbors, where mixed type variables are considered together. When evaluated on a public social science dataset of 46,043 mixed type instances with up to 33% missing values, DCI resulted in more than 20% improved imputation accuracy over Multiple Imputation, Predictive Mean Matching, Linear and Multilevel Regression, and Mean Mode Replacement methods. Data imputed by 6 methods were used for prediction tests by NB-Tree, Random Subset Selection and Neural Networkbased classification models. In our experiments classification accuracy obtained using DCI-preprocessed data was much better than when relying on alternative imputation methods for data preprocessing.

Vadim V. Ayuyev, Joseph Jupin, Philip W. Harris, Z

Real-time Traffic

Data Mining | DAWAK 2009 | Dynamic Clustering Imputation | Missing Values | Mixed Type |

claim paper

Post Info
More Details (n/a)

Added	19 May 2010
Updated	19 May 2010
Type	Conference
Year	2009
Where	DAWAK
Authors	Vadim V. Ayuyev, Joseph Jupin, Philip W. Harris, Zoran Obradovic

Comments (0)

Sciweavers

Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data

Data Mining | DAWAK 2009 | Dynamic Clustering Imputation | Missing Values | Mixed Type |

Explore & Download

Productivity Tools

Sciweavers