Sciweavers

10 search results - page 2 / 2
» How Does Resource Utilization Affect Fault Tolerance
Sort
View
HPCC
2010
Springer
13 years 8 months ago
A Generic Execution Management Framework for Scientific Applications
Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolera...
Tanvire Elahi, Cameron Kiddle, Rob Simmonds
SC
2009
ACM
14 years 2 months ago
Flexible cache error protection using an ECC FIFO
We present ECC FIFO, a mechanism enabling two-tiered last-level cache error protection using an arbitrarily strong tier-2 code without increasing on-chip storage. Instead of addin...
Doe Hyun Yoon, Mattan Erez
RTAS
2009
IEEE
14 years 2 months ago
Adaptive Failover for Real-Time Middleware with Passive Replication
Supporting uninterrupted services for distributed soft real-time applications is hard in resource-constrained and dynamic environments, where processor or process failures and sys...
Jaiganesh Balasubramanian, Sumant Tambe, Chenyang ...
PDP
2010
IEEE
14 years 2 months ago
The Design and Implementation of the SWIM Integrated Plasma Simulator
Abstract—As computing capabilities have increased, the coupling of computational models has become an increasingly viable and therefore important way of improving the physical ï¬...
Wael R. Elwasif, David E. Bernholdt, Aniruddha G. ...
SIGOPS
2008
146views more  SIGOPS 2008»
13 years 7 months ago
Vigilant: out-of-band detection of failures in virtual machines
What do our computer systems do all day? How do we make sure they continue doing it when failures occur? Traditional approaches to answering these questions often involve inband m...
Dan Pelleg, Muli Ben-Yehuda, Richard Harper, Lisa ...