One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consu...
- Scientific workflows have become an integral part of cyberinfrastructure as their computational complexity and data sizes have grown. However, the complexity of the distributed i...
MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop...
In this paper, we propose a parallel algorithm for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. The new par...
Left unchecked, the fundamental drive to increase peak performance using tens of thousands of power hungry components will lead to intolerable operating costs and failure rates. H...