Parallel machines are typically space shared, or time shared such that only one application executes on a group of nodes at any given time. It is generally assumed that executing ...
Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end compu...
Aniruddha G. Shet, P. Sadayappan, David E. Bernhol...
This paper discusses the use of a parallel discrete-event network emulator called the Internet Protocol Traffic and Network Emulator (IP-TNE) for Web server benchmarking. The expe...
Rob Simmonds, Carey L. Williamson, Russell Bradfor...
A method is presented for modeling application performance on parallel computers in terms of the performance of microkernels from the HPC Challenge benchmarks. Specifically, the a...
MPICH2 provides a layered architecture for implementing MPI-2. In this paper, we provide a new design for implementing MPI-2 over InfiniBand by extending the MPICH2 ADI3 layer. Ou...