Partitioning strategies for distributed association rule mining

14 years 5 months ago

Download www.csc.liv.ac.uk

In this paper a number of alternative strategies for distributed/parallel association rule mining are investigated. The methods examined make use of a data structure, the T-tree, introduced previously by the authors as a structure for organising sets of attributes for which support is being counted. We consider six different approaches, representing different ways of parallelising the basic Apriori-T algorithm that we use. The methods focus on different mechanisms for partitioning the data between processes, and for reducing the message-passing overhead. Both `horizontal' (data distribution) and `vertical' (candidate distribution) partitioning strategies are considered, including a vertical partitioning algorithm (DATA-VP) which we have developed to exploit the structure of the T-tree. We present experimental results examining the performance of the methods in implementations using JavaSpaces. We conclude that in a JavaSpaces environment, candidate distribution strategies of...

Frans Coenen, Paul H. Leng

Real-time Traffic