A platform for scalable one-pass analytics using MapReduce

14 years 9 months ago

Download www.cs.umass.edu

Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efﬁciently. MapReduce is a popular programming model for processing large datasets using a cluster of machines. However, the traditional MapReduce model is not well-suited for one-pass analytics, since it is geared towards batch processing and requires the data set to be fully loaded into the cluster before running analytical queries. This paper examines, from a systems standpoint, what architectural design changes are necessary to bring the beneﬁts of the MapReduce model to incremental one-pass analytics. Our empirical and theoretical analyses of Hadoop-based MapReduce systems show that the widely-used sort-merge implementation for partitioning and parallel processing poses a fundamental barrier to incremental one-pass analytics, despite various optimizations. To address these limitations, we propose a new data analysis platform that employs hash t...

Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGreg

Real-time Traffic

Database | Database Management Systems | Fundamental Barrier | Magnitude Reduction | SIGMOD 2011 |

claim paper

» An experience report on scaling tools for mining software repositories using MapReduce

» The Performance of MapReduce An Indepth Study

» Scalable and Numerically Stable Descriptive Statistics in SystemML

» A comparison of join algorithms for log processing in MaPreduce

» Longitudinal Analytics on Web Archive Data Its About Time

» Client Cloud Evaluating Seamless Architectures for Visual Data Analytics in the Ocean Sci...

» Relational versus nonrelational database systems for data warehousing

» PreDatA preparatory data analytics on petascale machines

Post Info
More Details (n/a)

Added	17 Sep 2011
Updated	17 Sep 2011
Type	Journal
Year	2011
Where	SIGMOD
Authors	Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant J. Shenoy

Comments (0)

Sciweavers

A platform for scalable one-pass analytics using MapReduce

Database | Database Management Systems | Fundamental Barrier | Magnitude Reduction | SIGMOD 2011 |

Explore & Download

Productivity Tools

Sciweavers