Complex code bases require continual testing to ensure that both new development and routine maintenance do not create unintended side effects. Automation of regression testing is...
Joshua Hursey, Ethan Mallove, Jeffrey M. Squyres, ...
We present initial work on perturbation techniques that cause the manifestation of timing-related bugs in distributed memory Message Passing Interface (MPI)-based applications. Th...
Richard W. Vuduc, Martin Schulz, Daniel J. Quinlan...
Abstract. The MPI-2 standard added a new feature to MPI called generalized requests. Generalized requests allow users to add new nonblocking operations to MPI while still using man...
Robert Latham, William Gropp, Robert B. Ross, Raje...
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...