Sciweavers

535 search results - page 14 / 107
» Fault tolerant high performance computing by a coding approa...
Sort
View
ACTA
2005
104views more  ACTA 2005»
13 years 7 months ago
Optimal recovery schemes in fault tolerant distributed computing
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all n computers are up and running, we would like the load to be evenly distr...
Kamilla Klonowska, Håkan Lennerstad, Lars Lu...
FTCS
1996
132views more  FTCS 1996»
13 years 9 months ago
An Approach towards Benchmarking of Fault-Tolerant Commercial Systems
This paper presents a benchmark for dependablesystems. The benchmark consists of two metrics, number of catastrophic incidents and performance degradation, which are obtained by a...
Timothy K. Tsai, Ravishankar K. Iyer, Doug Jewitt
CCGRID
2006
IEEE
14 years 1 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
IPPS
2006
IEEE
14 years 1 months ago
An advanced performance analysis of self-stabilizing protocols: stabilization time with transient faults during convergence
A self-stabilizing protocol is a brilliant framework for fault tolerance. It can recover from any number and any type of transient faults and eventually converge to its intended b...
Yoshihiro Nakaminami, Hirotsugu Kakugawa, Toshimit...
IPPS
2003
IEEE
14 years 26 days ago
Using Golomb Rulers for Optimal Recovery Schemes in Fault Tolerant Distributed Computing
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distrib...
Kamilla Klonowska, Lars Lundberg, Håkan Lenn...