Continued scaling of CMOS technology to smaller transistor sizes makes modern processors more susceptible to both transient and permanent hardware faults. Circuitlevel techniques ...
It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On t...
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Group communication protocols are used in fault-tolerant systems to maintain strong replica consistency. The FaultTolerant Multicast Protocol (FTMP) described here is a group comm...
Louise E. Moser, P. M. Melliar-Smith, Ruppert R. K...
Current processor allocation techniques for highly parallel systems are based on centralized front-end based algorithms. As a result, the applied strategies are restricted to stat...