Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

14 years 18 days ago

Download sci2s.ugr.es

Discretization is defined as the process that divides continuous numeric values into intervals of discrete categorical values. In this article, the concept of cost-based discretization as a pre-processing step to the induction of a classifier is introduced in order to obtain an optimal multi-interval splitting for each numeric attribute.A transparent description of the method and the steps involved in cost-based discretization are given. The aim of this paper is to present this method and to assess the potential benefits of such an approach. Furthermore, its performance against two other well-known methods, i.e. entropy- and pure error-based discretization is examined. To this end, experiments on 14 data sets, taken from the UCI Repository on Machine Learning were carried out. In order to compare the different methods, the area under the Receiver Operating Characteristic (ROC) graph was used and tested on its level of significance. For most data sets the results show that cost-based d...

Davy Janssens, Tom Brijs, Koen Vanhoof, Geert Wets

Real-time Traffic

COR 2006 | Cost-based Discretization | Discretization | Error-based Discretization |

claim paper

Post Info
More Details (n/a)

Added	11 Dec 2010
Updated	11 Dec 2010
Type	Journal
Year	2006
Where	COR
Authors	Davy Janssens, Tom Brijs, Koen Vanhoof, Geert Wets

Comments (0)

Sciweavers

Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

COR 2006 | Cost-based Discretization | Discretization | Error-based Discretization |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers