Scaling feature size improves processor performance but increases each device’s susceptibility to defects (i.e., hard errors). As a result, fabrication technology must improve s...
The role of brokers in client-server systems is to accommodate flexible, open, heterogeneous system design and to facilitate fault tolerance and improved performance through load...
Omotunde Adebayo, John E. Neilson, Dorina C. Petri...
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...
Abstract—Networks-on-Chip (NoCs) are implicitly fault tolerant due to their inherent redundancy. They can overcome defective cores, links and switches. As a side effect, yield is...
Atefe Dalirsani, Stefan Holst, Melanie Elm, Hans-J...
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...