Supervised Evaluation of Dataset Partitions: Advantages and Practice

16 years 2 days ago

Download perso.rd.francetelecom.fr

In the context of large databases, data preparation takes a greater importance : instances and explanatory attributes have to be carefully selected. In supervised learning, instances partitioning techniques have been developped for univariate representations, leading to precise and comprehensible evaluations of the amount of information contained in an attribute, with respect to the target attribute. Still, the multivariate case remains unstated. In this paper, we describe the partitioning intrinsic convenience for data preparation and we settle a framework for supervised partitioning. A new evaluation criterion of labelled objects partitions, which is based on Minimum Description Length principle, is then set and tested on real and synthetic data sets. 1 Supervised partitioning problems in data preparation In a data mining project, the data preparation phase is a key one. Its main goal is to provide a clean and representative database for the consecutive modelling phase [3]. Typically...

Sylvain Ferrandiz, Marc Boullé

Real-time Traffic