In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to ...
We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (...
A key step in program optimization is the determination of optimal values for code optimization parameters such as cache tile sizes and loop unrolling factors. One approach, which...
We describe how we have parallelized Python, an interpreted object oriented scripting language, and used it to build an extensible message-passing molecular dynamics application f...
Real-time systems usually operate in an environment that changes continuously. These changes cause the performance of the system to vary during run time. An allocation of resource...