Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
Continued advancements in fabrication technology and reductions in feature size create challenges in maintaining both manufacturing yield rates and long-term reliability of device...
Premkishore Shivakumar, Stephen W. Keckler, Charle...
Decreasing hardware reliability is expected to impede the exploitation of increasing integration projected by Moore's Law. There is much ongoing research on efficient fault t...
Man-Lap Li, Pradeep Ramachandran, Ulya R. Karpuzcu...
Fibonacci Cubes (FCs), together with the enhanced and extended forms, are a family of interconnection topologies formed by diluting links from binary hypercube. While they scale up...
Abstract—A Network-on-Chip (NoC) replaces on-chip communication implemented by point-to-point interconnects in a multi-core environment by a set of shared interconnects connected...