Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications

14 years 11 months ago

Download www.cecs.uci.edu

Sampling is a widely used technique to increase efﬁciency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling implementation that supports efﬁcient, multi-dimensional spatio-temporal sample generation on dynamic, large scale datasets stored on a storage cluster. The proposed algorithm leverages Hilbert space-ﬁlling curves in order to provide an approximate linear order of multidimensional data while maintaining spatial locality. This new implementation is then bootstrapped on top of our previous implementation, which efﬁciently samples large datasets along a single dimension (e.g., time), thereby realizing a service for spatio-temporal sampling. We evaluate the performance of our approach comparing it to the popular R-tree based technique. The experimental results show that our approach achieves up to an order of magnitude higher efﬁciency and scalability.

Xi Zhang, Tahsin M. Kurç, Joel H. Saltz, Sr

Real-time Traffic

Distributed And Parallel Computing | IPPS 2006 | Large Datasets | Multi-dimensional Spatio-temporal Sample | Scalable Sampling Implementation |

claim paper

» GigaTensor scaling tensor analysis up by 100 times algorithms and discoveries

» Browsing large scale cheminformatics data with dimension reduction

» Deterministic CUR for Improved LargeScale Data Analysis An Empirical Study

» Targeted Projection Pursuit for Interactive Exploration of High Dimensional Data Sets

» Stack Trace Analysis for Large Scale Debugging

» An Application of Latent Topic Document Analysis to LargeScale Proteomics Databases

» An analysis of a large scale habitat monitoring application

» Web Services Wind Tunnel On Performance Testing LargeScale Stateful Web Services

Post Info
More Details (n/a)

Added	12 Jun 2010
Updated	12 Jun 2010
Type	Conference
Year	2006
Where	IPPS
Authors	Xi Zhang, Tahsin M. Kurç, Joel H. Saltz, Srinivasan Parthasarathy

Comments (0)

Sciweavers

Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications

Distributed And Parallel Computing | IPPS 2006 | Large Datasets | Multi-dimensional Spatio-temporal Sample | Scalable Sampling Implementation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers