Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolera...
We present ECC FIFO, a mechanism enabling two-tiered last-level cache error protection using an arbitrarily strong tier-2 code without increasing on-chip storage. Instead of addin...
Supporting uninterrupted services for distributed soft real-time applications is hard in resource-constrained and dynamic environments, where processor or process failures and sys...
Abstract—As computing capabilities have increased, the coupling of computational models has become an increasingly viable and therefore important way of improving the physical ï¬...
Wael R. Elwasif, David E. Bernholdt, Aniruddha G. ...
What do our computer systems do all day? How do we make sure they continue doing it when failures occur? Traditional approaches to answering these questions often involve inband m...
Dan Pelleg, Muli Ben-Yehuda, Richard Harper, Lisa ...