Sciweavers

ESCIENCE
2007
IEEE

Intelligent Selection of Fault Tolerance Techniques on the Grid

14 years 5 months ago
Intelligent Selection of Fault Tolerance Techniques on the Grid
The emergence of computational grids has lead to an increased reliance on task schedulers that can guarantee the completion of tasks that are executed on unreliable systems. There are three common techniques for providing task-level fault tolerance on a grid: retrying, replicating, and checkpointing. While these techniques are varyingly successful at providing resilience to faults, each of them presents a tradeoff between performance and resource cost. As such, tasks having unique urgency requirements would ideally be placed using one of the techniques; for example, urgent tasks are likely to prefer the replication technique, which guarantees timely completion, whereas low priority tasks should not incur any extra resource cost in the name of fault tolerance. This paper introduces a placement and selection strategy which, by computing the utility of each fault tolerance technique in relation to a given task, finds the set of allocation options which optimizes the global utility. Heur...
Daniel C. Vanderster, Nikitas J. Dimopoulos, Randa
Added 02 Jun 2010
Updated 02 Jun 2010
Type Conference
Year 2007
Where ESCIENCE
Authors Daniel C. Vanderster, Nikitas J. Dimopoulos, Randall J. Sobie
Comments (0)