Full duplication of an entire application (through spatial or temporal redundancy) would detect many errors that are benign to the application from the perspective of the end-user...
In this paper, we study replication techniques for scaling and continuous operation for a dynamic content server. Our focus is on supporting transparent and fast reconfiguration ...
In this paper, we study and model a crash-recovery target and its failure detector’s probabilistic behavior. We extend Quality of Service (QoS) metrics to measure the recovery d...
Autonomous robots offer alluring perspectives in numerous application domains: space rovers, satellites, medical assistants, tour guides, etc. However, a severe lack of trust in t...
Synchronous circuits are typically clocked considering worst case timing paths so that timing errors are avoided under all circumstances. In the case of a pipelined processor, thi...
Viswanathan Subramanian, Mikel Bezdek, Naga Durga ...
This paper concerns the validity of a widely used method for estimating the architecture-level mean time to failure (MTTF) due to soft errors. The method first calculates the fai...
Xiaodong Li, Sarita V. Adve, Pradip Bose, Jude A. ...
Aggressive CMOS scaling will make future chip multiprocessors (CMPs) increasingly susceptible to transient faults, hard errors, manufacturing defects, and process variations. Exis...
Many important applications in wireless mesh networks require reliable multicast communication. Previously, Forward Error Correction (FEC) techniques have been proved successful f...