ABHA: A Framework for Autonomic Job Recovery

16 years 15 hour ago

Download www.stottlerhenke.com

Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; and rapidly integrating this information into the cluster architecture so that the failure is better mitigated in the future. The Agent Based High Availability (ABHA) system provides an API and a collection of services for building autonomic batch job recovery into cluster computing environments. An agent API allows users to define agents for failure diagnosis and recovery. It is currently being evaluated in the U.S. Department of Energy's STAR project.

Charles Earl, Emilio Remolina, Jim Ong, John Brown

Real-time Traffic

Autonomic Job Recovery | Cluster Computing | DSOM 2004 | Job Failure |

claim paper

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	DSOM
Authors	Charles Earl, Emilio Remolina, Jim Ong, John Brown, Chris Kuszmaul, Brad Stone

Sciweavers

ABHA: A Framework for Autonomic Job Recovery

Autonomic Job Recovery | Cluster Computing | DSOM 2004 | Job Failure |

Explore & Download

Productivity Tools

Sciweavers