Partitioning data parallel computations across a network of heterogeneous workstations is a dificult problem for the user: We have developed a runtime partitioning methodfor choosing the number and type of processors to apply to a data parallel computation, and a decomposition of the data domain in order to achieve reduced completion time. The partitioning method utilizes information about theproblem in theform of callbackfunctions and uses a set of topology-specgc communication functions to estimate communication costs. We show that the method makes effective partitioning decisions and has runtime overhead that is easily tolerated. In particular we show thatfor two implementations of a canonical stencil application, minimum elapsed times are obtained for a range of problem sizes on a network of heterogeneous workstations'.
Jon B. Weissman, Andrew S. Grimshaw