PMCRI: A Parallel Modular Classification Rule Induction Framework

14 years 6 months ago

Download www.maxbramer.org.uk

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

Frederic T. Stahl, Max A. Bramer, Mo Adda

Real-time Traffic

Data Mining Technologies | Large Datasets | Machine Learning | MLDM 2009 | Top Down Induction |

claim paper

Post Info
More Details (n/a)

Added	27 May 2010
Updated	27 May 2010
Type	Conference
Year	2009
Where	MLDM
Authors	Frederic T. Stahl, Max A. Bramer, Mo Adda

Comments (0)

Sciweavers

PMCRI: A Parallel Modular Classification Rule Induction Framework

Data Mining Technologies | Large Datasets | Machine Learning | MLDM 2009 | Top Down Induction |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers