This paper demonstrates that the dependability of generic, evolving J2EE applications can be enhanced through a combination of a few recovery-oriented techniques. Our goal is to r...
George Candea, Emre Kiciman, Steve Zhang, Pedram K...
This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterpr...
In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring appl...
George Candea, Emre Kiciman, Shinichi Kawamoto, Ar...
It is important that long running server programs retain availability amidst software failures. However, server programs do fail and one of the important causes of failures in ser...
Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover softw...
Stelios Sidiroglou, Oren Laadan, Carlos Perez, Nic...