MapReduce Online

15 years 8 months ago

Download neilconway.org

MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We present a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop and can run unmodified user-defined MapReduce programs.

Tyson Condie, Neil Conway, Peter Alvaro, Joseph M.

Real-time Traffic

Batch Jobs | Computer Networks | Fault Tolerance | MapReduce | NSDI 2010 |

claim paper

» Cheetah A High Performance Custom Data Warehouse on Top of MapReduce

» Power management of online dataintensive services

» Network traffic characteristics of data centers in the wild

» Mining advertiserspecific user behavior using adfactors

» Quincy fair scheduling for distributed computing clusters

» Semisupervised truth discovery

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	NSDI
Authors	Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, Russell Sears

Comments (0)

Sciweavers

MapReduce Online

Batch Jobs | Computer Networks | Fault Tolerance | MapReduce | NSDI 2010 |

Explore & Download

Productivity Tools

Sciweavers