Sciweavers

SC
1992
ACM

Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs

14 years 4 months ago
Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs
A common debugging strategy involves reexecuting a program (on a given input) over and over, each time gaining more information about bugs. Such techniques can fail on message-passing parallel programs. Because of variations in message latencies and process scheduling, different runs on the given input may produce different results. This non-repeatability is a serious debugging problem, since an execution cannot always be reproduced to track down bugs. This paper presents a technique for tracing and replaying message-passing programs for debugging. Our technique is optimal in the common case and has good performance in the worst case. By making run-time tracing decisions, we trace only a fraction of the total number of messages, gaining two orders of magnitude reduction over traditional techniques which trace every message. Experiments indicate that only 1% of the messages often need be traced. These traces are sufficient to provide replay, allowing an execution to be reproduced any n...
Robert H. B. Netzer, Barton P. Miller
Added 10 Aug 2010
Updated 10 Aug 2010
Type Conference
Year 1992
Where SC
Authors Robert H. B. Netzer, Barton P. Miller
Comments (0)