Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging. In this paper, we show that fail-stop process failures in ScaLAPACK matrix ...
In this paper, we study fault-tolerantmulticastingin multistage interconnection networks (MINs) for constructing large-scale multicomputers. In addition to point-to-point routing ...
Jinsoo Kim, Jaehyung Park, Jung Wan Cho, Hyunsoo Y...
The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. The logical network topologies must also suppo...
The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network which directly links arbitrary pairs of processor nodes wit...
We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple v...