This paper introduces a method for improving program run-time performance by gathering work in an application and executing it efficiently in an integrated thread. Our methods ext...
Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is...
From a software engineering perspective, the Java programming language provides an attractive platform for writing numerically intensive applications. A major drawback hampering i...
Pedro V. Artigas, Manish Gupta, Samuel P. Midkiff,...
This paper describes a source to source compilation tool for optimizing MPI-based parallel applications. This tool is able to automatically apply a “prepushing” transformation...
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...