Sciweavers

1166 search results - page 7 / 234
» Crash Management for Distributed Parallel Systems
Sort
View
EUROPAR
2001
Springer
13 years 12 months ago
Building TMR-Based Reliable Servers Despite Bounded Input Lifetimes
This paper is on the construction of a server subsystem in a client/server system in an application context where the number of potential clients can be arbitrarily large. The imp...
Paul D. Ezhilchelvan, Jean-Michel Hélary, M...
DAIS
2007
13 years 8 months ago
Parallel State Transfer in Object Replication Systems
Abstract. Replication systems require a state-transfer mechanism in order to recover crashed replicas and to integrate new ones into replication groups. This paper presents and eva...
Rüdiger Kapitza, Thomas Zeman, Franz J. Hauck...
HPDC
1993
IEEE
13 years 11 months ago
Resource Management for Distributed Parallel Systems
Multiprocessor systems should exist in the the larger context of distributed systems, allowing multiprocessor resources to be shared by those that need them. Unfortunately, typica...
B. Clifford Neuman, Santosh Rao
IPPS
1999
IEEE
13 years 11 months ago
Optimizing Irregular HPF Applications using Halos
This paper presents language features for High Performance Fortran HPF to specify non-local access patterns of distributed arrays, called halos, and to control the communication as...
Siegfried Benkner
IPPS
2007
IEEE
14 years 1 months ago
Tiresias: Black-Box Failure Prediction in Distributed Systems
Faults in distributed systems can result in errors that manifest in several ways, potentially even in parts of the system that are not collocated with the root cause. These manife...
Andrew W. Williams, Soila M. Pertet, Priya Narasim...