Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to suppor...
TPT-RAID is a multi-box RAID wherein each ECC group comprises at most one block Jrom any given storage box, and can thus tolerate a boxJailure. It extends the idea ojan out-oj-ban...
Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...
A model was introduced in [Fraga97] for integrating replication techniques in heterogeneous systems. The model adopts a reflective structure based on the meta-object approach [10]...
Lau Cheuk Lung, Joni da Silva Fraga, Carlos Mazier...
The hybrid redundancy structure found at the cellular level of higher animals provides complex organism with the three key features of a reliability-engineered system: fault toler...