A comparison of approaches to large-scale data analysis

16 years 7 months ago

Download database.cs.brown.edu

There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of t...

Andrew Pavlo, Erik Paulson, Alexander Rasin, Danie

Real-time Traffic

Database | Dramatic Performance Difference | Parallel Dbmss | Parallel Sql Database | SIGMOD 2009 |

claim paper

» On the storage management and analysis of multi similarity for large scale protein structu...

» A Large Scale Data Mining Approach to Antibiotic Resistance Surveillance

» Deterministic CUR for Improved LargeScale Data Analysis An Empirical Study

» Permutation Filtering A Novel Concept for Significance Analysis of LargeScale Genomic Data

» Data prefetching for smooth navigation of large scale JPEG 2000 images

» LargeScale Maximum Margin Discriminant Analysis Using Core Vector Machines

» SmurfPDMS A Platform for Query Processing in LargeScale PDMS

» Large scale statistical inference of signaling pathways from RNAi and microarray data

» An Application of Latent Topic Document Analysis to LargeScale Proteomics Databases

Post Info
More Details (n/a)

Added	05 Dec 2009
Updated	05 Dec 2009
Type	Conference
Year	2009
Where	SIGMOD
Authors	Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker

Comments (0)

Sciweavers

A comparison of approaches to large-scale data analysis

Database | Dramatic Performance Difference | Parallel Dbmss | Parallel Sql Database | SIGMOD 2009 |

Explore & Download

Productivity Tools

Sciweavers