The Hadoop distributed filesystem: Balancing portability and performance

14 years 7 months ago

Download www.jeffshafer.com

—Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level ﬁlesystem. This ﬁlesystem — HDFS — is written in Java and designed for portability across heterogeneous hardware and software platforms. This paper analyzes the performance of HDFS and uncovers several performance issues. First, architectural bottlenecks exist in the Hadoop implementation that result in inefﬁcient HDFS usage due to delays in scheduling new MapReduce tasks. Second, portability limitations prevent the Java implementation from exploiting features of the native platform. Third, HDFS implicitly makes portability assumptions about how the native platform manages storage resources, even though native ﬁlesystems and I/O schedulers vary widely in design and behavior. This paper investigates the root causes of these performance bottlenecks in order to evaluate tradeoffs between portabil...

Jeffrey Shafer, Scott Rixner, Alan L. Cox

Real-time Traffic

Hadoop | Hadoop Implementation | ISPASS 2010 | Software Engineering | Storage Resources |

claim paper

Post Info
More Details (n/a)

Added	17 May 2010
Updated	17 May 2010
Type	Conference
Year	2010
Where	ISPASS
Authors	Jeffrey Shafer, Scott Rixner, Alan L. Cox

Comments (0)

Sciweavers

The Hadoop distributed filesystem: Balancing portability and performance

Hadoop | Hadoop Implementation | ISPASS 2010 | Software Engineering | Storage Resources |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers