Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...
Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirement...
Alexandre E. Eichenberger, Edward S. Davidson, San...
We argue that the key underpinning of the current state-of-the real-time practice — the priority artifact — and that of the current state-of-the real-time art — deadline-bas...
Abstract. Loops are an important source of optimization. In this paper, we address such optimizations for those cases when loops contain kernels mapped on reconfigurable fabric. We...
Ozana Silvia Dragomir, Elena Moscu Panainte, Koen ...
We want to perform compile-time analysis of an SPMD program and place barriers in it to synchronize it correctly, minimizing the runtime cost of the synchronization. This is the b...