We present an efficient framework for dynamic reconfiguration of application-specific custom instructions. A key component of this framework is an iterative algorithm for temporal...
This paper focuses on generating efficient software pipelined schedules for in-order machines, which we call Converged Trace Schedules. For a candidate loop, we form a string of t...
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to ...
Modulo scheduling is a major optimization of high performance compilers wherein The body of a loop is replaced by an overlapping of instructions from different iterations. Hence ...
Exploiting parallelism at both the multiprocessor level and the instruction level is an e ective means for supercomputers to achieve high-performance. The amount of instruction-le...
Scott A. Mahlke, William Y. Chen, John C. Gyllenha...