The memory hierarchy of most multicore systems contains one or more levels of cache that is shared among multiple cores. The shared-cache architecture presents many opportunities f...
Abstract--This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand (IB) Host Cha...
Richard L. Graham, Stephen W. Poole, Pavel Shamis,...
We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/...
Priya Natarajan, Thomas H. Cormen, Elena Riccio St...
Abstract--A hybrid MPI/Pthreads parallelization was implemented in the RAxML phylogenetics code. New MPI code was added to the existing Pthreads production code to exploit parallel...