Search Sciweavers | Sciweavers

1256 search results - page 16 / 252

» On Coordinated Checkpointing in Distributed Systems

183

click to vote

SC
2009
ACM

254views Applied Computing» more SC 2009»

FALCON: a system for reliable checkpoint recovery in shared grid environments

16 years 23 days ago

Download cobweb.ecn.purdue.edu

In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...

Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...

claim paper

Read More »

212

click to vote

SIGMETRICS
2011
ACM

245views Hardware» more SIGMETRICS 2011»

Record and transplay: partial checkpointing for replay debugging across heterogeneous systems

14 years 8 months ago

Download www.ncl.cs.columbia.edu

Software bugs that occur in production are often diﬃcult to reproduce in the lab due to subtle diﬀerences in the application environment and nondeterminism. To address this pr...

Dinesh Subhraveti, Jason Nieh

claim paper

Read More »

148

click to vote

PODC
1998
ACM

102views Distributed and Parallel Com...» more PODC 1998»

Persistent Messages in Local Transactions

15 years 10 months ago

Download www.eecs.umich.edu

: We present a new model for handling messages and state in a distributed application that we call Messages in Local Transactions (MLT). Under this model, messages and data are not...

David E. Lowell, Peter M. Chen

claim paper

Read More »

145

click to vote

HCW
2000
IEEE

134views Distributed And Parallel Com...» more HCW 2000»

Reliable Cluster Computing with a New Checkpointing RAID-x Architecture

15 years 10 months ago

Download escal.yonsei.ac.kr

In a serverless cluster of PCs or workstations, the cluster must allow remote file accesses or parallel I/O directly performed over disks distributed to all client nodes. We intro...

Kai Hwang, Hai Jin, Roy S. C. Ho, Wonwoo Ro

claim paper

Read More »

149

click to vote

IPPS
2005
IEEE

159views Distributed And Parallel Com...» more IPPS 2005»

Current Practice and a Direction Forward in Checkpoint/Restart Implementations for Fault Tolerance

15 years 11 months ago

Download hpc.pnl.gov

Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hiberna...

José Carlos Sancho, Fabrizio Petrini, Kei D...

claim paper

Read More »

« Prev « First page 16 / 252 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers