To fully tap into the potential of heterogeneous machines composed of multicore processors and multiple accelerators, simple offloading approaches in which the main trunk of the ap...
Improving program locality has become increasingly important on modern computer systems. An effective strategy is to group computations on the same data so that once the data are ...
We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (...
Task graph scheduling has been found effective in performance prediction and optimization of parallel applications. A number of static scheduling algorithms have been proposed for...
Contemporary long-term storage devices feature powerful embedded processors and sizeable memory buffers. Active Storage Devices (ASD) is the hard disk technology that makes use of ...