As the desire of scientists to perform ever larger computations drives the size of today’s high performance computers from hundreds, to thousands, and even tens of thousands of ...
Assigning an application’s fault-tolerance properties (e.g., replication style, checkpointing frequency) statically, and in an arbitrary manner, can lead to the application not ...
We describe the interface between a real-time resource allocation system with an AI planner in order to create fault-tolerant plans that are guaranteed to execute in hard real-tim...
Ella M. Atkins, Tarek F. Abdelzaher, Kang G. Shin,...
Grid Resource Discovery Service is currently a very important focus of research. We propose a scheme that presents essential characteristics for efficient, self-configuring and fau...
—We introduce Zen, a new resource allocation framework that assigns application components to node clusters to achieve high availability for partial-fault tolerant (PFT) applicat...