Sciweavers

ICDE
2010
IEEE

Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database

14 years 11 months ago
Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be rerun on a different node if the node initially responsible fails or returns too slowly. Our approach is inspired by the fault tolerance properties of MapReduce, in which map or reduce jobs are greedily assigned to workers, and failed jobs are rerun on other workers. Osprey is implemented using a middleware approach, with only a small amount of custom code to handle cluster coordination. Each node in the system is a discrete database system running on a separate machine. Data, in the form of tables, is partitioned amongst database nodes and each partition is replicated on several nodes, using a technique called chained declustering [1]. A coordinator machine acts as a standard SQL interface to users; it ...
Christopher Yang, Christine Yen, Ceryen Tan, Samue
Added 20 Dec 2009
Updated 03 Jan 2010
Type Conference
Year 2010
Where ICDE
Authors Christopher Yang, Christine Yen, Ceryen Tan, Samuel Madden
Comments (0)