Fault-tolerant programs are typically not only difficult to implement but also incur extra costs in terms of performance or resource consumption. Failures are typically relatively ...
Ilwoo Chang, Matti A. Hiltunen, Richard D. Schlich...
As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adapt...
Abstract. As grids typically consist of autonomously managed subsystems with strongly varying resources, fault-tolerance forms an important aspect of the scheduling process of appl...
Maria Chtepen, Filip H. A. Claeys, Bart Dhoedt, Fi...
: A middleware architecture named ROAFTS (Real-time Object-oriented Adaptive Fault Tolerance Support) is presented. ROAFTS is designed to support adaptive fault-tolerant execution ...
—Considerable work has been done on providing fault tolerance capabilities for different software components on largescale high-end computing systems. Thus far, however, these fa...
Rinku Gupta, Pete Beckman, Byung-Hoon Park, Ewing ...