Abstract. Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading ca...
Management of large-scale parallel and distributed applications is an extremely complex task due to factors such as centralized management architectures, lack of coordination and ...
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
This paper investigates randomization and replication as strategies to achieve reliable performance in disk arrays targeted for video-on-demand (VoD) workloads. A disk array can p...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational resources available on the Internet. Such systems allow guest jobs to run on a ho...