Sciweavers

GCA
2007

Fault-Tolerant Job Execution over Multi-Clusters Using Mobile Agents

14 years 1 months ago
Fault-Tolerant Job Execution over Multi-Clusters Using Mobile Agents
AgentTeamwork is a mobile-agent-based job coordination system that targets a mixture of computing nodes, some directly connected to the public Internet and others simply clustered in a private IP domain but not managed by a commodity job scheduler. The system allows its mobile agents to carry a user job with them from the public to private IP domains as well as to form a hierarchy where agents are recursively spawned to launch a job at a different node, to monitor their parent and children, to resume them upon their crash, and to relay a job-termination signal to the root agent, (i.e., the one directly communicating with a user). To manage multiple clusters, all agents running within the same cluster constitute a subtree derived from the agent residing at their cluster head. This algorithm enables mobile agents to deploy a job to multiple cluster heads and henceforth to their cluster-internal computing nodes, as well as to monitor and resume the job both across clusters and within eac...
Munehiro Fukuda, Emory Horvath, Solomon Lane
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where GCA
Authors Munehiro Fukuda, Emory Horvath, Solomon Lane
Comments (0)