: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
Deterministic observation and random excitation of fault sites during the ATPG process dramatically reduces the overall defective part level. However, multiple observations of eac...
Sooryong Lee, Brad Cobb, Jennifer Dworak, Michael ...
Fault links represent relationships between the types of mistakes made and the type of module being developed or modified. The existence of such fault links can be used to guide co...
Jane Huffman Hayes, Inies C. M. Raphael, Vinod Kum...
The paper proposes a new concept of diagnosing faulty links in Network-on-a-Chip (NoC) designs. The method is based on functional fault models and it implements packet address dri...
This paper investigates under which conditions information can be reliably shared and consensus can be solved in unknown and anonymous message-passing networks that suffer from cr...