—We perform a statistical analysis and describe the asymptotic behavior of the frequency and size distribution of δoccurrent, minimal δ-occurrent, and maximal δ-occurrent itemsets occurring in random datasets across the entire spectrum of δ. We also describe the probability distribution of the support of an n-element itemset in a random dataset. We find that for small values of δ relative to number of transactions the size distribution of δ-occurrent itemsets and maximal δ-occurrent itemsets can be approximated by the binomial distributions b(L, 1 1+2δ ) and b(L, 1 2δ ), respectively, where L is inventory size. The ratio of minimal δ-occurrent and maximal δ-occurrent itemsets to the total number of δ-occurrent itemsets is low for small values of δ and rapidly approaches 1 as δ approaches the number of transactions. We also prove that the probability distribution of the support of an n-element itemset in a random k-transaction dataset is binomial of type b(k, 1 2n ).
Dan Singer, David J. Haglin, Anna M. Manning