Brewer and Kuszmaul [BK94] demonstrated how barriers and traffic interleaving can alleviate the problem of bulk-transfer performance degradation on the Thinking Machines CM-5, by ...
Eric A. Brewer, Paul Gauthier, Armando Fox, Angela...
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split ...
Low-overhead message passing is critical to the performance of many applications. Active Messages[27] reduce the software overhead for message handling: messages are run as handle...
Deborah A. Wallach, Wilson C. Hsieh, Kirk L. Johns...
Heterogeneous network-based systems are emerging as attractive computing platforms for HPC applications. This paper discusses fundamental research issues that must be addressed to...
Prashanth B. Bhat, Viktor K. Prasanna, Cauligi S. ...