We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techni...
Ernie Chan, Marcel Heimlich, Avi Purkayastha, Robe...
Accurate, reproducible and comparable measurement of the overheads, communication times and progression behavior of blocking and nonblocking collective operations is a complicated...
This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques i...
This paper denes and describes the properties of a multicast virtual topology, the M-array, and a resource-ecient variation, the REM-array. It is shown how several collective op...
This paper presents and validates performance models for a variety of high-performance collective communication algorithms for systems with Cell processors. The systems modeled in...