Wide SIMD-based GPUs have evolved into a promising platform for running general purpose workloads. Current programmable GPUs allow even code with irregular control to execute well...
This paper presents a case study about the applicability and usage of non blocking collective operations. These operations provide the ability to overlap communication with computa...
Torsten Hoefler, Peter Gottschling, Andrew Lumsdai...