Sciweavers

DSN
2006
IEEE

Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults

14 years 5 months ago
Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults
The Solaris 10 Operating System includes a number of new features for predictive self-healing. One such feature is the ability of the Fault Management software to diagnose memory errors and drive automatic memory page retirement (MPR), intended to reduce the negative impact of permanent memory faults that generate either correctable or uncorrectable errors on system reliability, availability, and serviceability (RAS). The MPR technique allows memory pages suffering from correctable errors and relocatable clean pages suffering from uncorrectable errors to be removed from use in the virtual memory system without interrupting user applications. It also allows relocatable dirty pages associated with uncorrectable errors to be isolated with limited impact on affected user processes, avoiding an outage for the entire system. This study applies analytical models, with parameters calibrated by field experience, to quantify the reduction that can be made by this operating system self-healing t...
Dong Tang, Peter Carruthers, Zuheir Totari, Michae
Added 11 Jun 2010
Updated 11 Jun 2010
Type Conference
Year 2006
Where DSN
Authors Dong Tang, Peter Carruthers, Zuheir Totari, Michael W. Shapiro
Comments (0)