Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
— Based on the proposed reliability characterization model, reliability-bounded low-power design as a methodology to balance reliability enhancement and power reduction in chip d...
Shengqi Yang, Wayne Wolf, Narayanan Vijaykrishnan,...
The deployment and use of Anomaly Detection (AD) sensors often requires the intervention of a human expert to manually calibrate and optimize their performance. Depending on the si...
Gabriela F. Cretu-Ciocarlie, Angelos Stavrou, Mich...
Some "non-' or "extra-functional" features, such as reliability, security, and tracing, defy modularization mechanisms in programming languages. This makes suc...
Eric Wohlstadter, Stoney Jackson, Premkumar T. Dev...
A number of techniques have been proposed to reduce the risk of data loss in hard-drives, from redundant disks (e.g., RAID systems) to error coding within individual drives. Disk ...