We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
Abstract. In recent years, there has been a surge of interest in Javabased volunteer computing systems, which aim to make it possible to build very large parallel computing network...
Virtualization provides the possibility of whole machine migration and thus enables a new form of fault tolerance that is completely transparent to applications and operating syst...
Clock-related operations are one of the many sources of replica non-determinism and of replica inconsistency in fault-tolerant distributed systems. In passive replication, if the ...
Wenbing Zhao, Louise E. Moser, P. M. Melliar-Smith
In a recent paper [2] we have proposed FT-TCP: an architecture that allows a replicated service to survive crashes without breaking its TCP connections. FT-TCP is attractive in pr...
Dmitrii Zagorodnov, Keith Marzullo, Lorenzo Alvisi...