Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier synchronization through software, ha...
Rajeev Sivaram, Craig B. Stunkel, Dhabaleswar K. P...
Processing on the Amoeba distributed operating system is not fault-tolerant. The only concern of its processing service is to perform load balancing on the existing processors, tr...
We propose an improved probabilistic method for reading remote clocks in systems subject to unbounded communication delays and use this method to design a family of fault-tolerant...
A shared disk implementation on distributed storage requires consistent behavior of disk operations. Deterministic consensus on such behavior is impossible when even a single stor...
The paper presents objectives and results of a series of case studies in computer support for diagnosis, failure mode and effects analysis, and the creation of repair manuals in t...