We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...
Passive monitoring or testing of complex systems and networks running in the field can provide valuable insights into their behavior in actual environments of use. In certain con...
Distributed denial-of-service (DDoS) attacks pose a significant threat to the Internet. Most solutions proposed to-date face scalability problems as the size and speed of the netwo...
P. E. Ayres, H. Sun, H. Jonathan Chao, Wing Cheong...
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
Software monocultures are usually considered dangerous because their size and uniformity represent the potential for costly and widespread damage. The emerging concept of collabor...
Michael E. Locasto, Stelios Sidiroglou, Angelos D....