Sciweavers

USENIX
2007

Exploring Recovery from Operating System Lockups

14 years 1 months ago
Exploring Recovery from Operating System Lockups
Operating system lockup errors can render a computer unusable by preventing the execution other programs. Watchdog timers can be used to recover from a lockup by resetting the processor and rebooting the system when a lockup is detected. This results in a loss of unsaved data in running programs. Based on the observation that volatile memory is not affected when a processor a reset occurs, we present an approach to recover from a watchdog reset with minimal or zero loss of application state. We study the resolution of lockup conditions using thread termination and using exception dispatch. Thread termination can still result in a usable system and is already used as a recovery strategy for other errors in Linux. Using exceptions allows developers to write code to handle a lockup within the erroneous thread and attempt application transparent recovery. Fault injection experiments show that a significant percentage of lockups can be recovered by thread termination. Exception handling f...
Francis M. David, Jeffrey C. Carlyle, Roy H. Campb
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2007
Where USENIX
Authors Francis M. David, Jeffrey C. Carlyle, Roy H. Campbell
Comments (0)