Parallel Out-of-Core Divide-and-Conquer Techniques with Application to Classification Trees

15 years 11 months ago

Download ipdps.cc.gatech.edu

Classification is an important problem in the field of data mining. Construction of good classifiers is computationally intensive and offers plenty of scope for parallelization. Divide-and-conquer paradigm can be used to efficiently construct decision tree classifiers. We discuss in detail various techniques for parallel divide-and-conquer and extend these techniques to handle efficiently disk-resident data. Furthermore, a generic technique for parallel out-ofcore divide-and-conquer problems is suggested. We present pCLOUDS, the parallel version of the decision tree classifier algorithm CLOUDS, capable of handling large outof-core data sets. pCLOUDS exhibits excellent speedup, sizeup and scaleup properties which make it a competitive tool for data mining applications. We evaluate the performance of pCLOUDS for a range of synthetic data sets on the IBM-SP2.

Mahesh K. Sreenivas, Khaled Alsabti, Sanjay Ranka

Real-time Traffic

Data Mining | Data Sets | Decision Tree Classifier | Distributed And Parallel Computing | IPPS 1999 |

claim paper

Added	03 Aug 2010
Updated	03 Aug 2010
Type	Conference
Year	1999
Where	IPPS
Authors	Mahesh K. Sreenivas, Khaled Alsabti, Sanjay Ranka

Sciweavers

Parallel Out-of-Core Divide-and-Conquer Techniques with Application to Classification Trees

Data Mining | Data Sets | Decision Tree Classifier | Distributed And Parallel Computing | IPPS 1999 |

Explore & Download

Productivity Tools

Sciweavers