Emergence of new parallel architectures presents new challenges for application developers. Supercomputers vary in processor speed, network topology, interconnect communication ch...
Abhinav Bhatele, Lukasz Wesolowski, Eric J. Bohm, ...
In this paper, we analyze restrictions of traditional communication performance models affecting the accuracy of analytical prediction of the execution time of collective communic...
Alexey L. Lastovetsky, Vladimir Rychkov, Maureen O...
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on ...
Pavan Balaji, Anthony Chan, William Gropp, Rajeev ...
As high-end computing systems continue to grow in scale, recent advances in multiand many-core architectures have pushed such growth toward more denser architectures, that is, mor...
Pavan Balaji, Darius Buntinas, David Goodell, Will...
We report on the development of a new computational framework for efficiently carrying out parallel data redistribution in a limited memory environment. This new library, MADRE (T...
Scientific workflows are frequently being used to model complex phenomena, to analyze instrumental data, to tie together information from distributed sources, and to pursue other ...
We describe and evaluate a new, pipelined algorithm for large, irregular all-gather problems. In the irregular all-gather problem each process in a set of processes contributes in...
Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, indexbased matrix representations, which degrade cache utilization by i...