A Grouping Method for Categorical Attributes Having Very Large Number of Values

14 years 6 months ago

Download perso.rd.francetelecom.fr

In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In case of very large number of values, the risk of overfitting the data increases sharply and building good groupings becomes difficult. In this paper, we propose two new grouping methods founded on a Bayesian approach, leading to Bayes optimal groupings. The first method exploits a standard schema for grouping models and the second one extends this schema by managing a "garbage" group dedicated to the least frequent values. Extensive comparative experiments demonstrate that the new grouping methods build high quality groupings in terms of predictive quality, robustness and small number of groups.

Marc Boullé

Real-time Traffic

Bayes Optimal Groupings | Categorical Attribute | Initial Attribute | Machine Learning | MLDM 2005 |

claim paper

Post Info
More Details (n/a)

Added	28 Jun 2010
Updated	28 Jun 2010
Type	Conference
Year	2005
Where	MLDM
Authors	Marc Boullé

Comments (0)

Sciweavers

A Grouping Method for Categorical Attributes Having Very Large Number of Values

Bayes Optimal Groupings | Categorical Attribute | Initial Attribute | Machine Learning | MLDM 2005 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers