A note on split selection bias in classification trees

14 years 7 months ago

Download www.math.ccu.edu.tw

A common approach to split selection in classification trees is to search through all possible splits generated by predictor variables. A splitting criterion is then used to evaluate those splits and the one with the largest criterion value is usually chosen to actually channel samples into corresponding subnodes. However, this greedy method is biased in variable selection when the numbers of the available split points for each variable are different. Such result may thus hamper the intuitively appealing nature of classification trees. The problem of the split selection bias for two-class tasks with numerical predictors is examined. The statistical explanation of its existence is given and a solution based on the P-values is provided, when the Pearson chisquare statistic is used as the splitting criterion. keyword Cram

Y.-S. Shih

Real-time Traffic

Classification Tree | CSDA 2004 | Splitting Criterion | Statistics |

claim paper

Post Info
More Details (n/a)

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2004
Where	CSDA
Authors	Y.-S. Shih

Comments (0)

Sciweavers

A note on split selection bias in classification trees

Classification Tree | CSDA 2004 | Splitting Criterion | Statistics |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers