Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present...
Yair Amir, Brian A. Coan, Jonathan Kirsch, John La...
Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not c...
—This paper presents the first hierarchical Byzantine fault-tolerant replication architecture suitable to systems that span multiple wide area sites. The architecture confines ...
Yair Amir, Claudiu Danilov, Danny Dolev, Jonathan ...
The Common Request Broker Architecture (CORBA) specification originally did not include any support for fault-tolerance. The Fault-Tolerant CORBA standard was added to address th...
This paper describes the design and implementation of SecondSite, a cloud-based service for disaster tolerance. SecondSite extends the Remus virtualization-based high availability...
Shriram Rajagopalan, Brendan Cully, Ryan O'Connor,...