The current multiprocessors such asCray T3D support interprocessor communication using partitioned dimension-order routers (PDRs). In a PDR implementation, the routing logic and sw...
This paper deals with tolerance to timing faults in time-constrained systems. TAFT (Time Aware Fault-Tolerant) is a recently devised approach which applies tolerance to timing vio...
F. Sandrini, Felicita Di Giandomenico, Andrea Bond...
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
CX, a network-based computational exchange, is presented. The system's design integrates variations of ideas from other researchers, such as work stealing, non-blocking tasks...