Efficient decision tree construction on streaming data

15 years 21 days ago

Download www.cse.ohio-state.edu

Decision tree construction is a well studied problem in data mining. Recently, there has been much interest in mining streaming data. Domingos and Hulten have presented a one-pass algorithm for decision tree construction. Their work uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. In this paper, we revisit this problem. We make the following two contributions: 1) We present a numerical interval pruning (NIP) approach for efficiently processing numerical attributes. Our results show an average of 39% reduction in execution times. 2) We exploit the properties of the gain function entropy (and gini) to reduce the sample size required for obtaining a given bound on the accuracy. Our experimental results show a 37% reduction in the number of data instances required. Overall, the two new techniques introduced here significantly improve the efficiency of decision tree construction on streaming data.

Ruoming Jin, Gagan Agrawal

Real-time Traffic

Data Mining | Decision Tree Construction | Gain Function Entropy | KDD 2003 | Mining Streaming Data |

claim paper

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2003
Where	KDD
Authors	Ruoming Jin, Gagan Agrawal

Comments (0)

Sciweavers

Efficient decision tree construction on streaming data

Data Mining | Decision Tree Construction | Gain Function Entropy | KDD 2003 | Mining Streaming Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers