Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that they only count rows where all their attributes are present, and do not allow for any noise. We show that generalizing the concept of frequency while preserving the performance of mining algorithms is nontrivial, and introduce a generalization of frequent itemsets, dense itemsets. Dense itemsets do not require all attributes to be present at the same time; instead, the itemset needs to define a sufficiently large submatrix that exceeds a given density threshold of attributes present. We consider the problem of computing all dense itemsets in a database. We give a levelwise algorithm for this problem, and also study the top-k variations, i.e., finding the k densest sets with a given support, or the k bestsupported sets with a given density. These algorithms select the other parameter automatic...
Heikki Mannila, Jouni K. Seppänen