Sciweavers

ISCA
2002
IEEE

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

14 years 5 months ago
ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors
This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of availability, performance, and hardware cost. ReVive performs checkpointing, logging, and distributed parity protection, all memory-based. It enables recovery from a wide class of errors, including the permanent loss of an entire node. To maintain high performance, ReVive includes specialized hardware that performs frequent operations in the background, such as log and parity updates. To keep the cost low, more complex checkpointing and recovery functions are performed in software, while the hardware modifications are limited to the directory controllers of the machine. Our simulation results on a 16-processor system indicate that the average error-free execution time overhead of using ReVive is only 6.3%, while the achieved availability is better than 99.999% even when the errors occur as often as once per day.
Milos Prvulovic, Josep Torrellas, Zheng Zhang
Added 15 Jul 2010
Updated 15 Jul 2010
Type Conference
Year 2002
Where ISCA
Authors Milos Prvulovic, Josep Torrellas, Zheng Zhang
Comments (0)