This paper presents a benchmark for dependablesystems. The benchmark consists of two metrics, number of catastrophic incidents and performance degradation, which are obtained by a tool that (1) generates synthetic workloads that produce a high level of CPU, memory, and I/O activity and (2) injects CPU, memory, and I/O faults according to an injection strategy. The benchmark has been installed on two TMR-based prototype machines: TMR Prototype A and TMR Prototype B. An implementation for a third prototype, is based on a duplex architecture, is in progress. The results demonstrate the utility of the benchmark in comparing the system-level fault tolerance of these machines and in providing insight into their design. In particular, the benchmark shows that Prototype B suffers fewer catastrophic incidents than Prototype A under the same workload conditions and fault injection method. However, Prototype B also suffers more performance degradation in the presence of faults, which might be an...
Timothy K. Tsai, Ravishankar K. Iyer, Doug Jewitt