As technology scales ever further, device unreliability is creating excessive complexity for hardware to maintain the illusion of perfect operation. In this paper, we consider whe...
Marc de Kruijf, Shuou Nomura, Karthikeyan Sankaral...
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Awareness of the need for robustness in distributed systems increases as distributed systems become integral parts of day-to-day systems. Self-stabilizing while tolerating ongoing ...
This paper proposes a solution based on forward error recovery, oriented towards providing dependability of composite Web services. While exploiting their possible support for fau...