Autonomous robots offer alluring perspectives in numerous application domains: space rovers, satellites, medical assistants, tour guides, etc. However, a severe lack of trust in t...
Computer systems are usually made fault tolerant through replication. By replicating a service on multiple servers we make sure that if some replicas fail, the service can still b...
Parisa Jalili Marandi, Marco Primi, Fernando Pedon...
Reliability is a major requirement for most safety-related systems. To meet this requirement, fault-tolerant techniques such as hardware replication and software re-execution are ...
Jia Huang, Jan Olaf Blech, Andreas Raabe, Christia...
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance parall...
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrin...
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...