Sciweavers

VLDB
1998
ACM

RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

14 years 3 months ago
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework for decision tree classifiers that separates the scalability aspects of algorithms for constructing a decision tree from the central features that determine the quality of the tree. This generic algorithm is easy to instantiate with specific algorithms from the literature (including C4.5, CART, CHAID, FACT, ID3 and extensions, SLIQ, Sprint and QUEST). In addition to its generality, in that it yields scalable versions of a wide range of classification algorithms, our approach also offers performance improvements of over a factor of five over the Sprint algorithm, the fastest scalable classification algorithm proposed previously. In contrast to Sprint, however, our generic algorithm requires a ...
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Gan
Added 06 Aug 2010
Updated 06 Aug 2010
Type Conference
Year 1998
Where VLDB
Authors Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti
Comments (0)