MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to ge...
Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clusteri...
The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tas...
Ying Chen, Frank K. H. A. Dehne, Todd Eavis, Andre...
Median-shift is a mode seeking algorithm that relies on
computing the median of local neighborhoods, instead of
the mean. We further combine median-shift with Locality
Sensitive...
Disk and network latency must be taken into account when applying parallel computing to large multidimensional datasets because they can hinder performance by reducing the rate at...