We present the first wait-free and self-stabilizing implementation of a single-writer/single-reader regular register by single-writer/single-reader safe registers. The constructio...
Modern microprocessors get more and more susceptible to transient faults, e.g. caused by high-energetic particles due to high integration, clock frequencies, temperature and decre...
NAP, a detection and recovery based scheme for implementing fault-tolerant itinerant computations, is presented. We give the semantics for the scheme and describe a protocol that ...
Dag Johansen, Keith Marzullo, Fred B. Schneider, K...
As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adapt...