Intelligent Selection of Fault Tolerance Techniques on the Grid

14 years 6 months ago

Download www.ece.uvic.ca

The emergence of computational grids has lead to an increased reliance on task schedulers that can guarantee the completion of tasks that are executed on unreliable systems. There are three common techniques for providing task-level fault tolerance on a grid: retrying, replicating, and checkpointing. While these techniques are varyingly successful at providing resilience to faults, each of them presents a tradeoff between performance and resource cost. As such, tasks having unique urgency requirements would ideally be placed using one of the techniques; for example, urgent tasks are likely to prefer the replication technique, which guarantees timely completion, whereas low priority tasks should not incur any extra resource cost in the name of fault tolerance. This paper introduces a placement and selection strategy which, by computing the utility of each fault tolerance technique in relation to a given task, ﬁnds the set of allocation options which optimizes the global utility. Heur...

Daniel C. Vanderster, Nikitas J. Dimopoulos, Randa

Real-time Traffic

Distributed And Parallel Computing | ESCIENCE 2007 | Fault Tolerance | Resource Cost | Task-level Fault Tolerance |

claim paper

Post Info
More Details (n/a)

Added	02 Jun 2010
Updated	02 Jun 2010
Type	Conference
Year	2007
Where	ESCIENCE
Authors	Daniel C. Vanderster, Nikitas J. Dimopoulos, Randall J. Sobie

Comments (0)

Sciweavers

Intelligent Selection of Fault Tolerance Techniques on the Grid

Distributed And Parallel Computing | ESCIENCE 2007 | Fault Tolerance | Resource Cost | Task-level Fault Tolerance |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers