Reformulating an algorithm to mask communication delays is crucial in maintaining scalability, but traditional solutions embed the overlap strategy into the application. We present an alternative approach based on dataflow, that factors the overlap strategy out of the application. Using this approach we are able to reduce communication delays, meeting and in many cases exceeding performance obtained with traditional hand coded applications. Key words: parallel programming, latency tolerance, non-SPMD, coarse grain dataflow
Jacob Sorensen, Scott B. Baden