Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

15 years 5 months ago

Download infosys.cs.uni-saarland.de

MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run complex analytical tasks over very large data sets on very large clusters and clouds. However, this comes at a price: MapReduce processes tasks in a scan-oriented fashion. Hence, the performance of Hadoop — an open-source implementation of MapReduce — often does not match the one of a well-conﬁgured parallel DBMS. In this paper we propose a new type of system named Hadoop++: it boosts task performance without changing the Hadoop framework at all (Hadoop does not even ‘notice it’). To reach this goal, rather than changing a working system (Hadoop), we inject our technology at the right places through UDFs only and aﬀect Hadoop from inside. This has three important consequences: First, Hadoop++ signiﬁcantly outperforms Hadoop. Second, any future changes of Hadoop may directly be used with Hadoop++ w...

Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, A

Real-time Traffic

Hadoop | Hadoop Framework | MapReduce | PVLDB 2010 |

claim paper

Post Info
More Details (n/a)

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	PVLDB
Authors	Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, Jörg Schad

Comments (0)

Sciweavers

Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

Hadoop | Hadoop Framework | MapReduce | PVLDB 2010 |

Explore & Download

Productivity Tools

Sciweavers