: The distributed recovery block (DRB) scheme is a widely applicable approach for realizing both hardware and software fault tolerance in real-time distributed and parallel compute...
A new fault tolerant architecture that provides tolerance to a broad scope of hardware, software, and communications faults is being developed. This architecture relies on widely ...
An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however wi...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
Reliability has become a serious concern as systems embrace nanometer technologies. In this paper, we propose a novel approach for organizing redundancy that provides high degree ...