This paper presents new techniques for focusing the discoveryof frequent itemsets within large, dense datasets containing highly frequent items. The existence of highly frequent items adds signi cantly to the cost of computing the complete set of frequent itemsets. Our approach allows for the exclusion of such items during the candidate generation phase of the Apriori algorithm. Afterwards, the highly frequent items can be reintroduced, via an inferencing framework, providing for a capability to generate frequent itemsets without counting their frequency. We demonstrate the use of these new techniques within the well-studied framework of the Apriori algorithm. Furthermore, we provide empirical results using our techniques on both synthetic and real datasets - both relevant since the real datasets exhibit statistical characteristics di erent from the probabilistic assumptions behind the synthetic data. The source we used for real data was the U.S. Census.
Dennis P. Groth, Edward L. Robertson