ART: Robustness of Meshes and Tori for Parallel and Distributed Computation

15 years 1 days ago

Download www.ece.ucsb.edu

In this paper, we formulate the array robustness theorems (ARTs) for efﬁcient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problems, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1 + o(1). The number of faults tolerated by ARTs ranges from o(min(n1;1 d n d n h)) for nary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor. The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X0Y0 routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithm...

Chi-Hsiang Yeh, Behrooz Parhami

Real-time Traffic