— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Continued device scaling enables microprocessors and other systems-on-chip (SoCs) to increase their performance, functionality, and hence, complexity. Simultaneously, relentless s...
Recently there has been significant interest in employing probabilistic techniques for fault localization. Using dynamic dependence information for multiple passing runs, learnin...
—Parallelism has often been used to improve the reliability and efficiency of a variety of different engineering systems. In this paper, we quantify the efficiency of paralleli...
Jian Tan, Wei Wei, Bo Jiang, Ness Shroff, Donald F...
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e.g. matrix multiplication and LINPACK) and bandwidth-i...