Clustering the Feature Space

14 years 2 months ago

Download www.di.unito.it

Abstract Dino Ienco and Rosa Meo Dipartimento di Informatica, Universit`a di Torino, Italy In this paper we propose and test the use of hierarchical clustering for feature selection in databases. The clustering method is Ward's with a distance measure based on Goodman-Kruskal . We motivate the choice of this measure and compare it with other ones. Our hierarchical clustering is applied to over 40 data-sets from UCI archive. The proposed approach is interesting from many viewpoints. First, it produces the feature subsets dendrogram which serves as a valuable tool to study relevance relationships among features. Secondarily, the dendrogram is used in a feature selection algorithm to select the best features by a wrapper method. Experiments were run with three different families of classifiers: Naive Bayes, decision trees and k nearest neighbours. Our method allows all the three classifiers to generally outperform their corresponding ones without feature selection. We compare our fea...

Dino Ienco, Rosa Meo

Real-time Traffic

Database | Feature Selection | Hierarchical Clustering | Meo Dipartimento Di | SEBD 2008 |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2008
Where	SEBD
Authors	Dino Ienco, Rosa Meo

Comments (0)

Sciweavers

Clustering the Feature Space

Database | Feature Selection | Hierarchical Clustering | Meo Dipartimento Di | SEBD 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers