For networks employing shortest-path routing, we introduce a new recovery scheme which needs only one backup routing table. By precomputing this backup table, the network recovers...
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...
We present S, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. S i...
Declarative Networking has been recently promoted as a high-level programming paradigm to more conveniently describe and implement systems that run in a distributed fashion over a ...
In this paper, we propose a new class of interconnection networks, called “biswapped networks” (BSNs). Each BSN is built of 2n copies of some n-node basis network using a simp...