Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where crossprocessor data dependencies can inhibit e cient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor node performs its computation in blocks; after each, it sends data to the next processor in the pipeline. The amount of computation before sending a message is called the block size; its choice, although di cult for a compiler to make, is critical to the e ciency of the program. Compilers typically use a static estimation of workload, which cannot always produce an e ective block size. This paper describes a exible run-time approach to choosing the block size. Our system takes measurements during the rst iteration of the program and then uses the results to build an execution model and choose an appropriate block size whi...
David K. Lowenthal, Michael James