Handling Missing Data in Trees: Surrogate Splits or Statistical Imputation

15 years 11 months ago

Download people.cs.uu.nl

Abstract. In many applications of data mining a - sometimes considerable - part of the data values is missing. This may occur because the data values were simply never entered into the operational systems from which the mining table was constructed, or because for example simple domain checks indicate that entered values are incorrect. Despite the frequent occurrence of missing data, most data mining algorithms handle missing data in a rather ad-hoc way, or simply ignore the problem. We investigate simulation-based data augmentation to handle missing data, which is based on lling-in imputing one or more plausible values for the missing data. One advantage of this approach is that the imputation phase is separated from the analysis phase, allowing for di erent data mining algorithms to be applied to the completed data sets. We compare the use of imputation to surrogate splits, such as used in CART, to handle missing data in tree-based mining algorithms. Experiments show that imputatio...

A. J. Feelders

Real-time Traffic

Data Mining | Data Mining Algorithms | Data Values | PKDD 1999 |

claim paper

Post Info
More Details (n/a)

Added	04 Aug 2010
Updated	04 Aug 2010
Type	Conference
Year	1999
Where	PKDD
Authors	A. J. Feelders

Comments (0)

Sciweavers

Handling Missing Data in Trees: Surrogate Splits or Statistical Imputation

Data Mining | Data Mining Algorithms | Data Values | PKDD 1999 |

Explore & Download

Productivity Tools

Sciweavers