We present initial work on perturbation techniques that cause the manifestation of timing-related bugs in distributed memory Message Passing Interface (MPI)-based applications. These techniques improve the coverage of possible message orderings in MPI applications that rely on nondeterministic point-to-point communication and work with small processor counts to alleviate the need to test at larger scales. Using carefully designed model problems, we show that these techniques aid testing for problems that are often not easily reproduced when running on small fractions of the machine. Our perturbation layer, JitterBug, builds on PN MPI, an extension of the MPI profiling interface that supports multiple layers of profiling libraries. We discuss how JitterBug complements existing MPI checking tools through the PN MPI framework. We present opportunities to build additional tools that statically analyze and directly transform the source code to support testing and debugging MPI applicatio...
Richard W. Vuduc, Martin Schulz, Daniel J. Quinlan