Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Recent experimental advances have demonstrated technologies capable of supporting scalable quantum computation. A critical next step is how to put those technologies together into...
Tzvetan S. Metodi, Darshan D. Thaker, Andrew W. Cr...
—Ultra large scale (ULS) systems are future software intensive systems that have billions of lines of code, composed of heterogeneous, changing, inconsistent and independent elem...
Reliability has become a serious concern as systems embrace nanometer technologies. In this paper, we propose a novel approach for organizing redundancy that provides high degree ...
Dependable distributed systems are difficult to build. This is particularly true if they have dependability requirements that change during the execution of an application, and are...
Michel Cukier, Jennifer Ren, Chetan Sabnis, David ...