One of the most well-studied problems in data mining is computing association rules from large transactional databases. Often, the rule collections extracted from existing datamining methods can be far too large to be carefully examined and understood by the data analysts. In this paper, we address exactly this issue of overwhelmingly large rule collections by introducing and studying the following problem: Given a large collection R of association rules we want to pick a subset of them S ⊆ R that best represents the original collection R as well as the dataset from which R was extracted. We first quantify the notion of the goodness of a ruleset using two very simple and intuitive definitions. Based on these definitions we then formally define and study the corresponding optimization problems of picking the best ruleset S ⊆ R. We propose algorithms for solving these problems and present experiments to show that our algorithms work well for real datasets and lead to large reduc...
Warren L. Davis IV, Peter Schwarz, Evimaria Terzi