Fault tolerance is a constant concern in data centers where servers have to run with a minimal level of failures. Changes on the operating conditions or on server demands, and variations of the systems own failure rate have to be handled in such a way that SLAs are honored and services are not interrupted. We present an approach to handle fault tolerance requirements, based on component replication, which is supported by a context-aware infrastructure and guided by contracts that describe adaptation policies for each application. At run-time the infrastructure autonomically manages the deployment, the monitoring of resources, the maintenance of the fault tolerance requirements described in the contract, and reconfigures the application when necessary, to maintain compliance. An example with an Apache web server and replicated Tomcat servers is used to validate the approach.
André Luiz B. Rodrigues, Leila N. Bezerra,