A comparison of join algorithms for log processing in MaPreduce

15 years 11 months ago

Download pages.cs.wisc.edu

The MapReduce framework is increasingly being used to analyze large volumes of data. One important type of data analysis done with MapReduce is log processing, in which a click-stream or an event log is ﬁltered, aggregated, or mined for patterns. As part of this analysis, the log often needs to be joined with reference data such as information about users. Although there have been many studies examining join algorithms in parallel and distributed DBMSs, the MapReduce framework is cumbersome for joins. MapReduce programmers often use simple but ineﬃcient algorithms to perform joins. In this paper, we describe crucial implementation details of a number of well-known join strategies in MapReduce, and present a comprehensive experimental comparison of these join techniques on a 100node Hadoop cluster. Our results provide insights that are unique to the MapReduce platform and oﬀer guidance on when to use a particular join algorithm on this platform. Categories and Subject Descriptors...

Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Ju

Real-time Traffic

Database | Join Algorithm | MapReduce Framework | SIGMOD 2010 | Studies Examining Join |

claim paper

» Parallel Star Join DataIndexes Efficient Query Processing in Data Warehouses and OLAP

» Search algorithms for multiway spatial joins

» Efficient Temporal Join Processing Using Indices

» A comparison of approximate Viterbi techniques and particle filtering for data estimation ...

» Automatic Term Extraction Using LogLikelihood Based Comparison with General Reference Corp...

» javaxXXL A prototype for a Library of Query processing Algorithms

» Parallel Spatial Joins Using Grid Files

» A performance comparison of distancebased query algorithms using Rtrees in spatial databas...

Post Info
More Details (n/a)

Added	18 Jul 2010
Updated	18 Jul 2010
Type	Conference
Year	2010
Where	SIGMOD
Authors	Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, Yuanyuan Tian

Comments (0)

Sciweavers

A comparison of join algorithms for log processing in MaPreduce

Database | Join Algorithm | MapReduce Framework | SIGMOD 2010 | Studies Examining Join |

Explore & Download

Productivity Tools

Sciweavers