Sciweavers

EDBT
2012
ACM

Clydesdale: structured data processing on MapReduce

12 years 2 months ago
Clydesdale: structured data processing on MapReduce
MapReduce has emerged as a promising architecture for large scale data analytics on commodity clusters. The rapid adoption of Hive, a SQL-like data processing language on Hadoop (an open source implementation of MapReduce), shows the increasing importance of processing structured data on MapReduce platforms. MapReduce offers several attractive properties such as the use of low-cost hardware, fault-tolerance, scalability, and elasticity. However, these advantages have required a substantial performance sacrifice. In this paper we introduce Clydesdale, a novel system for structured data processing on Hadoop – a popular implementation of MapReduce. We show that Clydesdale provides more than an order of magnitude in performance improvements compared to existing approaches without requiring any changes to the underlying platform. Clydesdale is aimed at workloads where the data fits a star schema. It draws on column oriented storage, tailored join-plans, and multicore execution strategi...
Tim Kaldewey, Eugene J. Shekita, Sandeep Tata
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where EDBT
Authors Tim Kaldewey, Eugene J. Shekita, Sandeep Tata
Comments (0)