Sciweavers

PPOPP
2009
ACM

MPIWiz: subgroup reproducible replay of mpi applications

15 years 1 months ago
MPIWiz: subgroup reproducible replay of mpi applications
Message Passing Interface (MPI) is a widely used standard for managing coarse-grained concurrency on distributed computers. Debugging parallel MPI applications, however, has always been a particularly challenging task due to their high degree of concurrent execution and non-deterministic behavior. Deterministic replay is a potentially powerful technique for addressing these challenges, with existing MPI replay tools adopting either data-replay or order-replay approaches. Unfortunately, each approach has its tradeoffs. Data-replay generates substantial log sizes by recording every communication message. Order-replay generates small logs, but requires all processes to be replayed together. We believe that these drawbacks are the primary reasons that inhibit the wide adoption of deterministic replay as the critical enabler of cyclic debugging of MPI applications. This paper describes subgroup reproducible replay (SRR), a hybrid deterministic replay method that provides the benefits of bo...
Ruini Xue, Xuezheng Liu, Ming Wu, Zhenyu Guo, Weng
Added 25 Nov 2009
Updated 25 Nov 2009
Type Conference
Year 2009
Where PPOPP
Authors Ruini Xue, Xuezheng Liu, Ming Wu, Zhenyu Guo, Wenguang Chen, Weimin Zheng, Zheng Zhang, Geoffrey M. Voelker
Comments (0)