While inherent resource redundancies in distributed applications facilitate gracefully degradable services, methods to enhance their dependability may have subtle, yet significant, performance implications, especially when such applications are stateful in nature. In this paper, we present a performability-oriented framework that enables the realization of software rejuvenation in stateful distributed applications. The framework is constructed based on three building blocks, namely, a rejuvenation algorithm, a set of performability metrics, and a performability model. We demonstrate via model-based evaluation that this framework enables error-accumulation-prone distributed applications to deliver services at the best possible performance level, even in environments in which a system is highly vulnerable to failures.
Ann T. Tai, Kam S. Tso, William H. Sanders, Savio