We describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less escient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the mnge of pammeter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be cam'ed out by using general query languages such as SQL, mther than by developing specialized black box algorithms. The set-oriented nature of Algorithm SETM facilitates the development of extensions.
Maurice A. W. Houtsma, Arun N. Swami