Sciweavers

1268 search results - page 207 / 254
» Verifying distributed systems: the operational approach
Sort
View
SRDS
2007
IEEE
14 years 4 months ago
Customizable Fault Tolerance for Wide-Area Replication
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present...
Yair Amir, Brian A. Coan, Jonathan Kirsch, John La...
SOSP
2005
ACM
14 years 6 months ago
BAR fault tolerance for cooperative services
This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both Byzantin...
Amitanand S. Aiyer, Lorenzo Alvisi, Allen Clement,...
FAST
2007
13 years 11 months ago
Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You?
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we presen...
Bianca Schroeder, Garth A. Gibson
USENIX
1996
13 years 11 months ago
Transparent Fault Tolerance for Parallel Applications on Networks of Workstations
This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...
Daniel J. Scales, Monica S. Lam
CCGRID
2010
IEEE
13 years 11 months ago
A High-Level Interpreted MPI Library for Parallel Computing in Volunteer Environments
Idle desktops have been successfully used to run sequential and master-slave task parallel codes on a large scale in the context of volunteer computing. However, execution of messa...
Troy P. LeBlanc, Jaspal Subhlok, Edgar Gabriel