Improving the Hadoop map/reduce framework to support concurrent appends through the BlobSeer BLOB management system

15 years 3 months ago

Download hal.inria.fr

Hadoop is a reference software framework supporting the Map/Reduce programming model. It relies on the Hadoop Distributed File System (HDFS) as its primary storage system. Although HDFS does not offer support for concurrently appending data to existing files, we argue that Map/Reduce applications as well as other classes of applications can benefit from such a functionality. We provide support for concurrent appends by building a concurrency-optimized data storage layer based on the BlobSeer data management service. Moreover, we modify the Hadoop Map/Reduce framework to use the append operation in the "reduce" phase of the application. To validate this work, we perform experiments on a large number of nodes of the Grid'5000 testbed. We demonstrate that massively concurrent append and read operations have a low impact on each other. Besides, measurements with an application available with Hadoop show that the support for concurrent appends to shared file is introduced wi...

Diana Moise, Gabriel Antoniu, Luc Bougé

Real-time Traffic

Concurrent Appends | Distributed And Parallel Computing | Hadoop | Hadoop Distributed File | HPDC 2010 |

claim paper

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2010
Where	HPDC
Authors	Diana Moise, Gabriel Antoniu, Luc Bougé

Sciweavers

Improving the Hadoop map/reduce framework to support concurrent appends through the BlobSeer BLOB management system

Concurrent Appends | Distributed And Parallel Computing | Hadoop | Hadoop Distributed File | HPDC 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers