The Performance of MapReduce: An In-depth Study

15 years 14 days ago

Download www.comp.nus.edu.sg

MapReduce has been widely used for large-scale data analysis in the Cloud. The system is well recognized for its elastic scalability and ﬁne-grained fault tolerance although its performance has been noted to be suboptimal in the database context. According to a recent study [19], Hadoop, an open source implementation of MapReduce, is slower than two state-of-the-art parallel database systems in performing a variety of analytical tasks by a factor of 3.1 to 6.5. MapReduce can achieve better performance with the allocation of more compute nodes from the cloud to speed up computation; however, this approach of “renting more nodes” is not cost eﬀective in a pay-as-you-go environment. Users desire an economical elastically scalable data processing system, and therefore, are interested in whether MapReduce can offer both elastic scalability and eﬃciency. In this paper, we conduct a performance study of MapReduce (Hadoop) on a 100-node cluster of Amazon EC2 with various levels of p...

Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu

Real-time Traffic