Parallel Incremental 2D-Discretization on Dynamic Datasets

16 years 12 days ago

Download www.cse.ohio-state.edu

Most current work in data mining assumes that the database is static, and a database update requires rediscovering all the patterns by scanning the entire old and new database. Such approaches can waste a lot of computational and I/O resources, and result in relatively slow response times, to essentially an interactive process. In this paper we address this issue in the context of 2-dimensional discretization within a multi-attribute database. Discretization, an important problem in data mining, is typically used to partition the range of continuous attribute(s) into intervals which highlight the behavior of a related discrete attribute. It can be used to build decision trees and to determine appropriate aggregations for On-Line Analytical Processing. We ﬁrst propose a time-optimal solution to the problem. We then parallelize and incrementalize the algorithm so that it can dynamically maintain the required information even in the presence of data updates without re-executing the alg...

Srinivasan Parthasarathy, Arun Ramakrishnan

Real-time Traffic

Data Mining | Distributed And Parallel Computing | IPPS 2002 | Most Current Work | Slow Response Times |

claim paper

Added	15 Jul 2010
Updated	15 Jul 2010
Type	Conference
Year	2002
Where	IPPS
Authors	Srinivasan Parthasarathy, Arun Ramakrishnan

Sciweavers

Parallel Incremental 2D-Discretization on Dynamic Datasets

Data Mining | Distributed And Parallel Computing | IPPS 2002 | Most Current Work | Slow Response Times |

Explore & Download

Productivity Tools

Sciweavers