We present design and implementation details as well as performance results for two new parallel checkpointing libraries developed by us for parallel MPI applications. The first o...
This paper describes a portable benchmark suite that assesses the ability of cluster networking hardware and software to overlap MPI communication and computation. The Communicati...
William Lawry, Christopher Wilson, Arthur B. Macca...
Abstract. Parallel Computational Science and Engineering (CSE) applications often exhibit irregular structure and dynamic load patterns. Many such applications have been developed ...
Abstract This paper proposes a new fine-grained data distribution operation MPI Alltoall specific that allows an element-wise distribution of data elements to specific target pro...
An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark ...