The rapid growth of transactional data brought, soon enough, into attention the need of its further exploitation. In this paper, we investigate the problem of securing sensitive knowledge from being exposed in patterns extracted during association rule mining. Instead of hiding the produced rules directly, we decide to hide the sensitive frequent itemsets that may lead to the production of these rules. As a first step, we introduce the notion of distance between two databases and a measure for quantifying it. By trying to minimize the distance between the original database and its sanitized version (that can safely be released), we propose a novel, exact algorithm for association rule hiding and evaluate it on real world datasets demonstrating its effectiveness towards solving the problem. Categories and Subject Descriptors H.2.8 [Database Applications]: Data mining; K.4.1 [Public Policy Issues]: Privacy. Keywords Privacy preserving data mining, association rule mining, sensitive item...
Aris Gkoulalas-Divanis, Vassilios S. Verykios