It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On t...
Abstract: The use of meta-level architectures for the implementation of faulttolerant systems is today very appealing. Nevertheless, all existing fault-tolerant systems based on th...
Data aggregation plays an important role in the design of scalable systems, allowing the determination of meaningful system-wide properties to direct the execution of distributed a...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...
This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both Byzantin...
Amitanand S. Aiyer, Lorenzo Alvisi, Allen Clement,...