Sciweavers

MLDM
2005
Springer

A Grouping Method for Categorical Attributes Having Very Large Number of Values

14 years 5 months ago
A Grouping Method for Categorical Attributes Having Very Large Number of Values
In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In case of very large number of values, the risk of overfitting the data increases sharply and building good groupings becomes difficult. In this paper, we propose two new grouping methods founded on a Bayesian approach, leading to Bayes optimal groupings. The first method exploits a standard schema for grouping models and the second one extends this schema by managing a "garbage" group dedicated to the least frequent values. Extensive comparative experiments demonstrate that the new grouping methods build high quality groupings in terms of predictive quality, robustness and small number of groups.
Marc Boullé
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where MLDM
Authors Marc Boullé
Comments (0)