ct Fault Tolerant MPI (FT-MPI)[6] was designed as a solution to allow applications different methods to handle process failures beyond simple check-point restart schemes. The init...
Graham E. Fagg, Thara Angskun, George Bosilca, Jel...
The advent of large scale multi-hop wireless networks highlights problems of fault tolerance and scale in distributed system, motivating designs that autonomously recover from tra...
Even though its advantages such as non-volatility, fast write access time and solid-state shock resistance, NAND flash memory suffers from out-place-update, limited erase cycles, a...
Song-Hwa Park, Tae-Hoon Kim, Tae-Hoon Lee, Ki-Dong...
Redundant threading architectures duplicate all instructions to detect and possibly recover from transient faults. Several lighter weight Partial Redundant Threading (PRT) archite...
Vimal K. Reddy, Eric Rotenberg, Sailashri Parthasa...
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all n computers are up and running, we would like the load to be evenly distr...