Sciweavers

139 search results - page 7 / 28
» Software Fault Tolerance of Distributed Programs Using Compu...
Sort
View
SOSP
2001
ACM
14 years 4 months ago
BASE: Using Abstraction to Improve Fault Tolerance
ing Abstraction to Improve Fault Tolerance MIGUEL CASTRO Microsoft Research and RODRIGO RODRIGUES and BARBARA LISKOV MIT Laboratory for Computer Science Software errors are a major...
Rodrigo Rodrigues, Miguel Castro, Barbara Liskov
ASPLOS
2006
ACM
14 years 1 months ago
Dependable != unaffordable
This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of relia...
Alan L. Cox, Kartik Mohanram, Scott Rixner
CLUSTER
2003
IEEE
14 years 26 days ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
ICDCS
1995
IEEE
13 years 11 months ago
Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach
One of the mostsoughtaftersoftware innovation of thisdecade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and ...
Partha Dasgupta, Zvi M. Kedem, Michael O. Rabin
PPOPP
2003
ACM
14 years 24 days ago
Automated application-level checkpointing of MPI programs
Because of increasing hardware and software complexity, the running time of many computational science applications is now more than the mean-time-to-failure of highpeformance com...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...