As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
Traditional agreement-based Byzantine fault-tolerant (BFT) systems process all requests on all replicas to ensure consistency. In addition to the overhead for BFT protocol and sta...
The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance de...
T. S. Ganesh, Viswanathan Subramanian, Arun K. Som...
This paper presents a new fault injection tool called Exhaustif (Exhaustive Workbench for Systems Reliability). Exhaustif is a SWIFI fault injection tool for fault tolerance verif...
The objective of this research is to convert ordinary idle PCs into virtual clusters for executing parallel applications. The paper introduces VolpexMPI that is designed to enable ...
Troy LeBlanc, Rakhi Anand, Edgar Gabriel, Jaspal S...