Replication is a technique commonly used to increase the availability of services in distributed systems, including grid and web services. While replication is relatively easy for services with fully deterministic behavior, grid and web services often include nondeterministic operations. The traditional way to replicate such nondeterministic services is to use the primary-backup approach. While this is straightforward in synchronous systems with perfect failure detection, typical grid environments are not usually considered to be synchronous systems. This paper addresses the problem of replicating nondeterministic services by designing a protocol based on Paxos and proposing two performance optimizations suitable for replicated grid services. The first improves the performance in the case where some service operations do not change the service state, while the second optimizes grid service requests that use transactions. Evaluations done both on a local cluster and on PlanetLab demon...
Xianan Zhang, Flavio Junqueira, Matti A. Hiltunen,