Sciweavers

234 search results - page 13 / 47
» Optimal recovery schemes in fault tolerant distributed compu...
Sort
View
ICDCS
2000
IEEE
14 years 26 days ago
On Low-Cost Error Containment and Recovery Methods for Guarded Software Upgrading
To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containmen...
Ann T. Tai, Kam S. Tso, Leon Alkalai, Savio N. Cha...
CODES
2010
IEEE
13 years 5 months ago
Hardware/software optimization of error detection implementation for real-time embedded systems
This paper presents an approach to system-level optimization of error detection implementation in the context of fault-tolerant realtime distributed embedded systems used for safe...
Adrian Lifa, Petru Eles, Zebo Peng, Viacheslav Izo...
ICAC
2007
IEEE
14 years 2 months ago
Fault-Tolerant Reliable Delivery of Messages in Distributed Publish/Subscribe Systems
Reliable delivery of messages is an important problem that needs to be addressed in distributed systems. In this paper we present our strategy to enable reliable delivery of messa...
Shrideep Pallickara, Hasan Bulut, Geoffrey Fox
PVM
2010
Springer
13 years 6 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
DAC
2011
ACM
12 years 8 months ago
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips
As transistor dimensions continue to scale deep into the nanometer regime, silicon reliability is becoming a chief concern. At the same time, transistor counts are scaling up, ena...
Andrew DeOrio, Konstantinos Aisopos, Valeria Berta...