Sciweavers

KDD
2000
ACM

Automating exploratory data analysis for efficient data mining

14 years 3 months ago
Automating exploratory data analysis for efficient data mining
Having access to large data sets for the purpose of predictive data mining does not guarantee good models, even when the size of the training data is virtually unlimited. Instead, careful data preprocessing is required, including data cleansing, handling missing values, attribute representation and encoding, and generating derived attributes. In particular, the selection of the most appropriate subset of attributes to include is a critical step in building an accurate and efficient model. We describe an automated approach to the exploration, preprocessing, and selection of the optimal attribute subset whose goal is to simplify the KDD process and dramatically shorten the time to build a model. Our implementation finds inappropriate and suspicious attributes, performs target dependency analysis, determining optimal attribute encoding, generates new derived attributes, and provides a flexible approach to attribute selection. We present results generated by an industrial KDD environment ...
Jonathan D. Becher, Pavel Berkhin, Edmund Freeman
Added 25 Aug 2010
Updated 25 Aug 2010
Type Conference
Year 2000
Where KDD
Authors Jonathan D. Becher, Pavel Berkhin, Edmund Freeman
Comments (0)