—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...
The intrinsic failure mechanisms and reliability models of state-of-the-art MOSFETs are reviewed. The simulation tools and failure equivalent circuits are described. The review in...
Joseph B. Bernstein, Moshe Gurfinkel, Xiaojun Li, ...
This paper presents fault-tolerant simulations of a single-writer multi-reader regular register in storage systems. One simulation tolerates fail-stop failures of storage servers ...
Large scale distributed systems typically have interactions among different services that create an avenue for propagation of a failure from one service to another. The failures ...
Simulation of wildfire spread remains to be a challenging task. In previous work, a cellular space fire spread simulation model has been developed based on the Discrete Event Syst...