We study the interaction between global and local techniques in data mining. Specifically, we study the collections of frequent sets in clusters produced by a probabilistic clustering using mixtures of Bernoulli models. That is, we first analyze 0–1 datasets by a global technique (probabilistic clustering using the EM algorithm) and then do a local analysis (discovery of frequent sets) in each of the clusters. The results indicate that the use of clustering as a preliminary phase in finding frequent sets produces clusters that have significantly different collections of frequent sets. We also test the significance of the differences in the frequent set collections in the different clusters by obtaining estimates of the underlying joint density. To get from the local patterns in each cluster back to distributions, we use the maximum entropy technique [17] to obtain a local model for each cluster, and then combine these local models to get a mixture model. We obtain clear impr...
Jaakko Hollmén, Jouni K. Seppänen, Hei