In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole rout...
Laxmi N. Bhuyan, Hu-Jun Wang, Ravi R. Iyer, Akhile...
—With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as w...
Jiangtian Li, Xiaosong Ma, Karan Singh, Martin Sch...
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for d...
In order to produce MPI applications that perform well on today’s parallel architectures, programmers need effective tools for collecting and analyzing performance data. Because ...
Shirley Moore, David Cronk, Kevin S. London, Jack ...
This paper presents a theoretical study to evaluate the performance of a family of parallel implementations of the propagation algorithm. The propagation algorithm is used to an i...
Leonardo Brenner, Luiz Gustavo Fernandes, Paulo Fe...