—Coordinated checkpointing simplifies failure recovery and eliminates domino effects in case of failures by preserving a consistent global checkpoint on stable storage. However, ...
This paper presents a recovery protocol for block I/O operations in Slice, a storage system architecture for highspeed LANs incorporating network-attached block storage. The goal ...
Complex distributed Internet services form the basis not only of e-commerce but increasingly of mission-critical networkbased applications. What is new is that the workload and in...
Failure detectors (or, more accurately Failure Suspectors { FS) appear to be a fundamental service upon which to build fault-tolerant, distributed applications. This paper shows t...
This paper describes an explanation-based approach lo learning plans despite a computationally intractable domain theory. In this approach, the system learns an initial plan using...