Search Sciweavers | Sciweavers

Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/r...

Geoffroy Vallée, Kulathep Charoenpornwattan...

claim paper

Read More »

195

click to vote

ICS
2007
Tsinghua U.

167views Distributed And Parallel Com...» more ICS 2007»

Proactive fault tolerance for HPC with Xen virtualization

16 years 22 days ago

Download www.csm.ornl.gov

Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...

Arun Babu Nagarajan, Frank Mueller, Christian Enge...

claim paper

Read More »

178

click to vote

CCGRID
2006
IEEE

123views Distributed And Parallel Com...» more CCGRID 2006»

MPI-Mitten: Enabling Migration Technology in MPI

16 years 19 days ago

Download www.cs.iit.edu

Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents ...

Cong Du, Xian-He Sun

claim paper

Read More »

151

click to vote

CORR
2008
Springer

81views Education» more CORR 2008»

Proactive Service Migration for Long-Running Byzantine Fault Tolerant Systems

15 years 6 months ago

Download academic.csuohio.edu

In this paper, we describe a proactive recovery scheme based on service migration for long-running Byzantine fault tolerant systems. Proactive recovery is an essential method for ...

Wenbing Zhao

claim paper

Read More »

« Prev « First page 1 / 9 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers