—With the advent of Cloud computing, large-scale virtualized compute and data centers are becoming common in the computing industry. These distributed systems leverage commodity ...
Gregor von Laszewski, Lizhe Wang, Andrew J. Younge...
Virtual machines can greatly simplify wide-area discomputing by lowering the level of abstraction to the benefit of both resource providers and users. Networking, however, can be ...
Abstract-- This work aims to pave the way for high availability in high-performance computing (HPC) by focusing on efficient redundancy strategies for head and service nodes. These...
Christian Engelmann, Stephen L. Scott, Chokchai Le...
The simulations used in the field of high energy physics are compute intensive and exhibit a high level of data parallelism. These features make such simulations ideal candidates ...
Laura Gilbert, Jeff Tseng, Rhys Newman, Saeed Iqba...
Maintenance is the dominant source of downtime at high availability sites. Unfortunately, the dominant mechanism for reducing this downtime, cluster rolling upgrade, has two short...