Search Sciweavers | Sciweavers

1256 search results - page 7 / 252

» On Coordinated Checkpointing in Distributed Systems

click to vote

IPPS
2006
IEEE

82views Distributed And Parallel Com...» more IPPS 2006»

Recent advances in checkpoint/recovery systems

14 years 2 months ago

Download www.cs.cornell.edu

Checkpoint and Recovery (CPR) systems have many uses in high-performance computing. Because of this, many developers have implemented it, by hand, into their applications. One of ...

Greg Bronevetsky, Rohit Fernandes, Daniel Marques,...

claim paper

Read More »

click to vote

HPDC
2007
IEEE

169views Distributed And Parallel Com...» more HPDC 2007»

Peer-to-peer checkpointing arrangement for mobile grid computing systems

14 years 3 months ago

Download www.ucoms.org

This paper deals with a novel, distributed, QoS-aware, peer-topeer checkpointing arrangement component for mobile Grid (MoG) computing systems middleware. Checkpointing is more cr...

Paul J. Darby III, Nian-Feng Tzeng

claim paper

Read More »

click to vote

ICDCS
2012
IEEE

238views Distributed And Parallel Com...» more ICDCS 2012»

Combining Partial Redundancy and Checkpointing for HPC

11 years 11 months ago

Download moss.csc.ncsu.edu

Today’s largest High Performance Computing (HPC) systems exceed one Petaﬂops (1015 ﬂoating point operations per second) and exascale systems are projected within seven years...

James Elliott, Kishor Kharbas, David Fiala, Frank ...

claim paper

Read More »

click to vote

CLOUDCOM
2010
Springer

142views Distributed And Parallel Com...» more CLOUDCOM 2010»

REMEM: REmote MEMory as Checkpointing Storage

13 years 6 months ago

Download ft.ornl.gov

Checkpointing is a widely used mechanism for supporting fault tolerance, but notorious in its high-cost disk access. The idea of memory-based checkpointing has been extensively stu...

Hui Jin, Xian-He Sun, Yong Chen, Tao Ke

claim paper

Read More »

click to vote

CLUSTER
2004
IEEE

103views Distributed And Parallel Com...» more CLUSTER 2004»

MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware

13 years 8 months ago

Download www.cse.msstate.edu

Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...

Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...

claim paper

Read More »

« Prev « First page 7 / 252 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers