The trend in workstation hardware is towards symmetric shared-memory multiprocessors (SMPs). User expectations are for (largely) automatic exploitation of parallelismon an SMP, similar to automatic exploitation of modern processor features such as caches and instruction scheduling. In this paper, we present our solution to automatic SMP parallelization. Our solution is unique in its robust support for unbalanced processor loads and nesting ofparallel loops and parallel sections, inconjunction withitstightintegrationwith high-order transformations for improved uniprocessor performance, so that the speedup due to parallelism is truly a multiplicative speedup over highly optimized uniprocessor execution times.
Jyh-Herng Chow, Leonard E. Lyon, Vivek Sarkar