Prediction by Categorical Features: Generalization Properties and Application to Feature Ranking

14 years 7 months ago

Download www.cs.huji.ac.il

We describe and analyze a new approach for feature ranking in the presence of categorical features with a large number of possible values. It is shown that popular ranking criteria, such as the Gini index and the misclassiﬁcation error, can be interpreted as the training error of a predictor that is deduced from the training set. It is then argued that using the generalization error is a more adequate ranking criterion. We propose a modiﬁcation of the Gini index criterion, based on a robust estimation of the generalization error of a predictor associated with the Gini index. The properties of this new estimator are analyzed, showing that for most training sets, it produces an accurate estimation of the true generalization error. We then address the question of ﬁnding the optimal predictor that is based on a single categorical feature. It is shown that the predictor associated with the misclassiﬁcation error criterion has the minimal expected generalization error. We bound the b...

Sivan Sabato, Shai Shalev-Shwartz

Real-time Traffic

COLT 2007 | Generalization Error | Gini Index | Machine Learning | Misclassiﬁcation Error |

claim paper

Post Info
More Details (n/a)

Added	07 Jun 2010
Updated	07 Jun 2010
Type	Conference
Year	2007
Where	COLT
Authors	Sivan Sabato, Shai Shalev-Shwartz

Comments (0)

Sciweavers

Prediction by Categorical Features: Generalization Properties and Application to Feature Ranking

COLT 2007 | Generalization Error | Gini Index | Machine Learning | Misclassiﬁcation Error |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers