Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to suppor...
As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adapt...
In this paper, we present adaptive fault-tolerant deadlock-free routing algorithms for hypercubes and meshes by using only 3 virtual channels and 2 virtual channels respectively. ...
— Three-dimensional die stacking integration provides the ability to stack multiple layers of processed silicon with a large number of vertical interconnects. Through Silicon Via...
Igor Loi, Subhasish Mitra, Thomas H. Lee, Shinobu ...
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present...
Yair Amir, Brian A. Coan, Jonathan Kirsch, John La...