Validating distributed systems is particularly difficult, since failures may occur due to a correlated occurrence of faults in different parts of the system. This paper describes ...
Michel Cukier, Ramesh Chandra, David Henke, Jessic...
Distributed applications can fail in subtle ways that depend on the state of multiple parts of a system. This complicates the validation of such systems via fault injection, since...
Ramesh Chandra, Ryan M. Lefever, Michel Cukier, Wi...
This paper describes a methodology for the development of real-time systems and shows its application to the modeling, analysis and implementation of distributed multimedia system...
—As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Fortunately, modern High Performance C...
In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBM® SP™ systems, a self-defining interv...
Ching-Farn Eric Wu, Anthony Bolmarcich, Marc Snir,...