Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
It has recently been shown that fair exchange, a security problem in distributed systems, can be reduced to a fault tolerance problem, namely a special form of distributed consensu...
Carole Delporte-Gallet, Hugues Fauconnier, Felix C...
This paper presents a new distributed computing framework for Many Task Computing (MTC) applications, based on the Extensible Messaging and Presence Protocol (XMPP). A lightweight...
Lance Stout, Michael A. Murphy, Sebastien Goasguen
Fast and accurate fault detection is becoming an essential component of management software for mission critical systems. A good fault detector makes possible to initiate repair a...
We present a new test generation procedure for sequential circuits using newly traversed state and newly detected fault information obtained between successive iterations of vecto...
Ashish Giani, Shuo Sheng, Michael S. Hsiao, Vishwa...