Oneof the mainobstacles in applying data mining techniques to large, real-world databasesis the lack of efficient data management.In this paper, wepresent the design and implementationof aneffective two-level architecture for a data mining environment. It consists of a mining tool and a parallel DBMSserver. The miningtool organize= and controls the search process, while the DBMSprovides optimal responsetimes for the few query types being used by the tool. Keyelementsof our architecture are its useof fast and simple databaseoperations, its re-use of results obtained by previous queries, its maximaluseof main-memoryto keepthe databasehot-set resident, andits parallel computationof queries. Apart froma clear separationof responsibilities, weshowthat this architecture leads to competitiveperformanceonlarge data sets. Moreover,this architecture providesa flexible experimentationplatform for further studies in optimization of repetitive databasequeries andquality driven rule discovery schem...
Marcel Holsheimer, Martin L. Kersten