Sciweavers

PPOPP
2005
ACM

A sampling-based framework for parallel data mining

14 years 5 months ago
A sampling-based framework for parallel data mining
The goal of data mining algorithm is to discover useful information embedded in large databases. Frequent itemset mining and sequential pattern mining are two important data mining problems with broad applications. Perhaps the most efficient way to solve these problems sequentially is to apply a pattern-growth algorithm, which is a divide-and-conquer algorithm [9, 10]. In this paper, we present a framework for parallel mining frequent itemsets and sequential patterns based on the divide-and-conquer strategy of pattern growth. Then, we discuss the load balancing problem and introduce a sampling technique, called selective sampling, to address this problem. We implemented parallel versions of both frequent itemsets and sequential pattern mining algorithms following our framework. The experimental results show that our parallel algorithms usually achieve excellent speedups. Categories and Subject Descriptors D.1 [Programming Techniques]: Concurrent programming—parallel programming; H.2...
Shengnan Cong, Jiawei Han, Jay Hoeflinger, David A
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where PPOPP
Authors Shengnan Cong, Jiawei Han, Jay Hoeflinger, David A. Padua
Comments (0)