It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On th...
Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the pre...
Safety is an important requirement for many modern systems. To ensure safety of complex critical systems, well-known safety analysis methods have been formalized. This holds in pa...
A major challenge facing grid applications is the appropriate handling of failures. In this paper we address the problem of making parallel Java applications based on Remote Method...
Fault tolerant systems based on the use of software design diversity may be able to achieve high levels of reliability more cost-effectively than other approaches, such as heroic ...