Sciweavers

392 search results - page 57 / 79
» Fault Tolerance in a DSM Cluster Operating System
Sort
View
DAC
2011
ACM
12 years 8 months ago
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips
As transistor dimensions continue to scale deep into the nanometer regime, silicon reliability is becoming a chief concern. At the same time, transistor counts are scaling up, ena...
Andrew DeOrio, Konstantinos Aisopos, Valeria Berta...
MIDDLEWARE
2009
Springer
14 years 3 months ago
Why Do Upgrades Fail and What Can We Do about It?
Abstract. Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading ca...
Tudor Dumitras, Priya Narasimhan
ISORC
2003
IEEE
14 years 1 months ago
A Dynamic Shadow Approach for Mobile Agents to Survive Crash Failures
Fault tolerance schemes for mobile agents to survive agent server crash failures are complex since developers normally have no control over remote agent servers. Some solutions mo...
Simon Pears, Jie Xu, Cornelia Boldyreff
ISCA
2010
IEEE
170views Hardware» more  ISCA 2010»
14 years 1 months ago
Relax: an architectural framework for software recovery of hardware faults
As technology scales ever further, device unreliability is creating excessive complexity for hardware to maintain the illusion of perfect operation. In this paper, we consider whe...
Marc de Kruijf, Shuou Nomura, Karthikeyan Sankaral...
SIGMETRICS
2008
ACM
121views Hardware» more  SIGMETRICS 2008»
13 years 8 months ago
Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems
Two schemes proposed to cope with unrecoverable or latent media errors and enhance the reliability of RAID systems are examined. The first scheme is the established, widely used d...
Ilias Iliadis, Robert Haas, Xiao-Yu Hu, Evangelos ...