Scalable and fault tolerant runtime environments are needed to support and adapt to the underlying libraries and hardware which require a high degree of scalability in dynamic larg...
Thara Angskun, Graham E. Fagg, George Bosilca, Jel...
Abstract. Sensor relocation protocols can be employed as fault tolerance approach to offset the coverage loss caused by node failures. We introduce a novel localized structure, in...
The design of safety-critical systems has typically adopted static techniques to simplify error detection and fault tolerance. However, economic pressure to reduce costs is exposi...
RAID architectures have been used for more than two decades to recover data upon disk failures. Disk failure is just one of the many causes of damaged data. Data can be damaged by...
The efficiency of service discovery is critical in the development of fully decentralized middleware intended to manage large scale computational grids. This demand influenced t...