Due to the increasingly di culty of discovering patterns in real-world databases using only conventional OLAP tools, an automated process such as data mining is currently essential. As data mining over large data sets can take a prohibitive amount of time related to the computational complexity of the algorithms, parallel processing has often been used as a solution. However, when data does not t in memory, some solutions do not apply and a database system may be required rather than at les. Most implementations use the database system loosely-coupled with the data mining algorithms. In this work we address the data consuming activities through parallel processing and data fragmentation on the database server, providing a tight integration with data mining techniques. Experimental results showing the potential bene ts of this integration were obtained, despite the di culties to process a complex application.
Mauro Sousa, Marta Mattoso, Nelson F. F. Ebecken