- In this paper we experiment with two optimization techniques we are considering implementing in a parallelizing compiler that generates parallel code for a distributed-memory system. We have found that there are two problems that often arise from the automatically generated message-passing code: 1) messages contain redundant data, and 2) the same data is sometimes transmitted to different processors, yet the messages are repacked for each processor. Our experiments demonstrate that it is indeed worthwhile suppressing the packing of redundant information in a message. Not only did it improve performance, but it allowed us to run the program on a larger input size. We also discovered that it is not worthwhile to suppress the repacking of the same message. The reason is because the size of the messages is a greater factor in the performance of a message-passing program than the number of instructions executed.
P. Jerry Martin, Clayton S. Ferner