In this paper, we investigate algorithms for generating communication code to run on distributedmemory systems. We modify algorithms from previously published work and prove that the algorithms produce correct code. We then extend these algorithms to incorporate the mapping of virtual processors to physical processors and prove the correctness of this extension. This technique can reduce the number of interprocessor messages. In the examples that we show, the total number of messages was reduced from O(N 2 ) to O(P2 ), where N is the input size and P is the number of physical processors. The reason that it is important to revisit communication code generation and to introduce a formal specication of the incorporation of mapping in the communication code generation is so that we can make use of the many scheduling heuristics proposed in the literature. We need a generalized mapping function so that we can apply dierent mapping and scheduling heuristics proposed in the literature for ea...
Clayton S. Ferner