: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
Distributed systems are becoming a popular way of implementing many embedded computing applications, automotive control being a common and important example. Such embedded systems...
Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To th...
Andrea Bondavalli, Silvano Chiaradonna, Felicita D...
This work presents a software-implemented fault tolerance approach for building a reliable database application in a CORBA environment. Database applications have functional requi...
Domenico Cotroneo, Nicola Mazzocca, Luigi Romano, ...
We present a scheme to guarantee that the execution of real-time tasks can tolerate transient and intermittent faults assuming any queue- based scheduling technique. The scheme is...