The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal compute...
Paul S. LeMahieu, Vasken Bohossian, Jehoshua Bruck
After having drawn up a state of the art on the theoretical feasibility of a system of periodic tasks scheduled by a preemptive algorithm at fixed priorities, we show in this art...
This work presents a new switching mechanism to tolerate arbitrary faults in interconnection networks with a negligible implementation cost. Although our routing technique can be ...
We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...
This paper addresses the issue of fault-tolerance in applications that make use of network storage. A network abstraction called the Network Storage Stack is presented, along with...
Scott Atchley, Stephen Soltesz, James S. Plank, Mi...