HeteroSort load balances and sorts within static or dynamic networks using a conceptual torus mesh. We ported HeteroSort to a 16-node Beowulf cluster with a central switch architec...
Pamela Yang, Timothy M. Kunau, Bonnie Holte Bennet...
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their com...
—Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenge...
Volodymyr V. Kindratenko, Jeremy Enos, Guochun Shi...
Modern computational science applications are becoming increasingly multi-disciplinaty involving widely distributed research teams and their underlying computational platforms. A ...
Hasan Abbasi, Matthew Wolf, Karsten Schwan, Greg E...
Abstract. This paper introduces ThreadMill - a distributed and parallel component architecture for applications that process large volumes of streamed (time-sequenced) data, such a...