Region-based compilation repartitions a program into more desirable compilation units for optimization and scheduling, particularly beneficial for ILP architectures. With region-...
—Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, ther...
Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored gra...
Parallel architectures are the way of the future, but are notoriously difficult to program. In addition to the low-level constructs they often present (e.g., locks, DMA, and non-...
Most current compiler analysis techniques are unable to cope with the semantics introduced by explicit parallel and synchronization constructs in parallel programs. In this paper ...
Diego Novillo, Ronald C. Unrau, Jonathan Schaeffer