Abstract— We present on-line tunable diagnostic and membership protocols for generic time-triggered (TT) systems to detect crashes, send/receive omission faults and network parti...
Building dependable distributed systems using ad hoc methods is a challenging task. Without proper support, an application programmer must face the daunting requirement of having ...
Jennifer Ren, Michel Cukier, Paul Rubel, William H...
Concerns about the reliability of real-time embedded systems that employ dynamic voltage scaling has recently been highlighted [1,2,3], focusing on transient-fault-tolerance techn...
Alireza Ejlali, Marcus T. Schmitz, Bashir M. Al-Ha...
Autonomous Decentralized System (ADS) has the properties of online expansion, online maintenance and fault tolerance, which are among the important features for next generation in...
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...