Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored graph. We present a technique that records the run time inter-node communication caused by the movement of array data between nodes during execution and builds the colored graph, and provide a simple algorithm that optimizes the coloring of this graph to describe new data distributions that would result in less inter-node communication. From the distribution information, we write compiler pragmas to be used in the application program. Using these techniques, we traced the execution of a real data-parallel application (written in CM Fortran) and collected the array access information. We computed new distributions that should provide an overall reduction in program execution time. However, compiler optimizations and poor interfaces between the compiler and runtime systems counteracted any potential benefit from t...
Krishna Kunchithapadam, Barton P. Miller