Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database

16 years 6 months ago

Download db.csail.mit.edu

In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be rerun on a different node if the node initially responsible fails or returns too slowly. Our approach is inspired by the fault tolerance properties of MapReduce, in which map or reduce jobs are greedily assigned to workers, and failed jobs are rerun on other workers. Osprey is implemented using a middleware approach, with only a small amount of custom code to handle cluster coordination. Each node in the system is a discrete database system running on a separate machine. Data, in the form of tables, is partitioned amongst database nodes and each partition is replicated on several nodes, using a technique called chained declustering [1]. A coordinator machine acts as a standard SQL interface to users; it ...

Christopher Yang, Christine Yen, Ceryen Tan, Samue

Real-time Traffic

Database | Database Nodes | Fault Tolerance Properties | ICDE 2010 | Slow Nodes |

claim paper

Post Info
More Details (n/a)

Added	20 Dec 2009
Updated	03 Jan 2010
Type	Conference
Year	2010
Where	ICDE
Authors	Christopher Yang, Christine Yen, Ceryen Tan, Samuel Madden

Comments (0)

Sciweavers

Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database

Database | Database Nodes | Fault Tolerance Properties | ICDE 2010 | Slow Nodes |

Explore & Download

Productivity Tools

Sciweavers