A grid-based approach for enterprise-scale data mining

15 years 6 months ago

Download www.cs.stonybrook.edu

Abstract— We describe a grid-based approach for enterprisescale data mining that leverages database technology for I/O parallelism, and on-demand compute servers for compute parallelism in the statistical computations. By enterprise-scale, we mean the highly-automated use of data mining in vertical business applications, where the data is stored on one or more relational database systems, and where a distributed architecture comprising of high-performance compute servers or a network of low-cost, commodity processors is used to improve application performance and provide the application deployment flexibility for overall workload management. The approach relies on an algorithmic decomposition of the data mining kernel on the data and compute grids, which makes it possible to exploit the parallelism on the respective grids in a simple way, while minimizing the data transfer between them. The overall approach is compatible with existing database standards for data mining task specifica...

Ramesh Natarajan, Radu Sion, Thomas Phan

Real-time Traffic