The vulnerability of computer nodes due to component failures is a critical issue for cluster-based file systems. This paper studies the development and deployment of mirroring in...
As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, w...
Self-healing systems focus on how to reducing the complexity and cost of the management of dependability policies and mechanisms without human intervention. This position paper pr...
An error that occurs in a microkernel operating system service can potentially result in state corruption and service failure. A simple restart of the failed service is not always...
Francis M. David, Ellick Chan, Jeffrey C. Carlyle,...
Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, Undetected Disk Errors (UDEs) also known as silent data corruption events, b...
Eric Rozier, Wendy Belluomini, Veera Deenadhayalan...