Sciweavers

FTCS
1998

How Fail-Stop are Faulty Programs?

14 years 25 days ago
How Fail-Stop are Faulty Programs?
Most fault-tolerant systems are designed to stop faulty programs before they write permanent data or communicate with other processes. This property (halt-on-failure) forms the core of the fail-stop model. Unfortunately, little experimental data exists on whether or not program failures follow the fail-stop model. This paper describes a tool, based on the SimOS complete-machine simulator, that can trace how faults propagate through memory, disk, and functions. Using this tool on the Postgres database system, we conduct a controlled experiment to measure how often faulty programs violate the fail-stop model. We find that a significant number of faults (7%) violate the fail-stop model by writing incorrect data to stable storage before halting. We then apply Postgres' transaction mechanism to undo recent changes before a crash and find that transactions reduce fail-stop violations by a factor of 3.
Subhachandra Chandra, Peter M. Chen
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1998
Where FTCS
Authors Subhachandra Chandra, Peter M. Chen
Comments (0)