Search Sciweavers | Sciweavers

307 search results - page 1 / 62

» On the Integrity of Lightweight Checkpoints

143

click to vote

HASE
2008
IEEE

129views Control Systems» more HASE 2008»

On the Integrity of Lightweight Checkpoints

16 years 1 months ago

Download www.ce.chalmers.se

This paper proposes a lightweight checkpointing scheme for real-time embedded systems. The goal is to separate concerns by allowing applications to take checkpoints independently ...

Raul Barbosa, Johan Karlsson

claim paper

Read More »

213

click to vote

SSS
2010
Springer

143views Control Systems» more SSS 2010»

Lightweight Live Migration for High Availability Cluster Service

15 years 5 months ago

Download www.real-time.ece.vt.edu

High availability is a critical feature for service clusters and cloud computing, and is often considered more valuable than performance. One commonly used technique to enhance the...

Bo Jiang, Binoy Ravindran, Changsoo Kim

claim paper

Read More »

188

click to vote

ICPADS
2010
IEEE

169views Distributed And Parallel Com...» more ICPADS 2010»

Hybrid Checkpointing for MPI Jobs in HPC Environments

15 years 4 months ago

Download moss.csc.ncsu.edu

As the core count in high-performance computing systems keeps increasing, faults are becoming common place. Checkpointing addresses such faults but captures full process images ev...

Chao Wang, Frank Mueller, Christian Engelmann, Ste...

claim paper

Read More »

163

click to vote

JFP
2010

107views more JFP 2010»

Lightweight checkpointing for concurrent ML

15 years 5 months ago

Download www.cs.purdue.edu

Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execut...

Lukasz Ziarek, Suresh Jagannathan

claim paper

Read More »

214

click to vote

SC
2000
ACM

110views Applied Computing» more SC 2000»

Scalable Fault-Tolerant Distributed Shared Memory

15 years 11 months ago

Download www.sc2000.org

This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be eﬃciently extended to tolerate single-node failures. In particular, we extend a ...

Florin Sultan, Thu D. Nguyen, Liviu Iftode

claim paper

Read More »

« Prev « First page 1 / 62 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers