This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to op...
Yanwei Niu, Ziang Hu, Kenneth E. Barner, Guang R. ...
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, sharedmemory multiprocessors. The algorithm considers data locality, parall...
Assembly lines with closed loop parallel lanes have the potential to continue to be productive when individual stations breakdown. A requirement in such parallel lane systems is t...
Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...
— In this paper we give a theoretical model for determining the synchronization frequency that minimizes the parallel execution time of loops with uniform dependencies dynamicall...
Florina M. Ciorba, Ioannis Riakiotakis, Theodore A...