How Fail-Stop are Faulty Programs?

14 years 1 months ago

Download www.eecs.umich.edu

Most fault-tolerant systems are designed to stop faulty programs before they write permanent data or communicate with other processes. This property (halt-on-failure) forms the core of the fail-stop model. Unfortunately, little experimental data exists on whether or not program failures follow the fail-stop model. This paper describes a tool, based on the SimOS complete-machine simulator, that can trace how faults propagate through memory, disk, and functions. Using this tool on the Postgres database system, we conduct a controlled experiment to measure how often faulty programs violate the fail-stop model. We find that a significant number of faults (7%) violate the fail-stop model by writing incorrect data to stable storage before halting. We then apply Postgres' transaction mechanism to undo recent changes before a crash and find that transactions reduce fail-stop violations by a factor of 3.

Subhachandra Chandra, Peter M. Chen

Real-time Traffic

Fail-stop Model | Faulty Programs | FTCS 1998 | Most Fault-tolerant Systems |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	1998
Where	FTCS
Authors	Subhachandra Chandra, Peter M. Chen

Comments (0)

Sciweavers

How Fail-Stop are Faulty Programs?

Fail-stop Model | Faulty Programs | FTCS 1998 | Most Fault-tolerant Systems |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers