Failure-aware checkpointing in fine-grained cycle sharing systems

16 years 29 days ago

Download www.ecn.purdue.edu

Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational resources available on the Internet. Such systems allow guest jobs to run on a host if they do not signiﬁcantly impact the local users of the host. Since the hosts are typically provided voluntarily, their availability ﬂuctuates greatly. To provide fault tolerance to guest jobs without adding signiﬁcant computational overhead, we propose failure-aware checkpointing techniques that apply the knowledge of resource availability to select checkpoint repositories and to determine checkpoint intervals. We present the schemes of selecting reliable and eﬃcient repositories from the non-dedicated hosts that contribute their disk storage. These schemes are formulated as 0/1 programming problems to optimize the network overhead of transferring checkpoints and the work lost due to unavailability of a storage host when needed to recover a guest job. We determine the checkpoint interval by comp...

Xiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi

Real-time Traffic

Checkpoint Interval | Distributed And Parallel Computing | Fault Tolerance | Guest Job | HPDC 2007 |

claim paper

» Prediction of Resource Availability in FineGrained Cycle Sharing Systems Empirical Evaluat...

» FALCON a system for reliable checkpoint recovery in shared grid environments

» Design and Implementation of a Middleware for Data Storage in Opportunistic Grids

Post Info
More Details (n/a)

Added	02 Jun 2010
Updated	02 Jun 2010
Type	Conference
Year	2007
Where	HPDC
Authors	Xiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi

Comments (0)

Sciweavers

Failure-aware checkpointing in fine-grained cycle sharing systems

Checkpoint Interval | Distributed And Parallel Computing | Fault Tolerance | Guest Job | HPDC 2007 |

Explore & Download

Productivity Tools

Sciweavers