Sciweavers

3886 search results - page 29 / 778
» Implementing Fault-Tolerant Distributed Applications
Sort
View
CLUSTER
2004
IEEE
13 years 7 months ago
MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
CCGRID
2008
IEEE
13 years 9 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin
SOSP
2001
ACM
14 years 4 months ago
BASE: Using Abstraction to Improve Fault Tolerance
ing Abstraction to Improve Fault Tolerance MIGUEL CASTRO Microsoft Research and RODRIGO RODRIGUES and BARBARA LISKOV MIT Laboratory for Computer Science Software errors are a major...
Rodrigo Rodrigues, Miguel Castro, Barbara Liskov
COMPSAC
2003
IEEE
14 years 26 days ago
Flexible Fault Tolerance in Configurable Middleware for Embedded Systems
MicroQoSCORBA (MQC) is a middleware platform that focuses on embedded applications by providing a very fine level of configurability of its internal orthogonal components. Using t...
Kevin E. Dorow
HPDC
1999
IEEE
13 years 12 months ago
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations
This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in ...
Adnan Agbaria, Roy Friedman