Abstract. Most parallel systems on which MPI is used are now hierarchical: some processors are much closer to others in terms of interconnect performance. One of the most common su...
Hao Zhu, David Goodell, William Gropp, Rajeev Thak...
Abstract. In this paper we study the existence of a small set T of spanning trees that collectively “1-span” an interval graph G. In particular, for any pair of vertices u, v w...
Derek G. Corneil, Feodor F. Dragan, Ekkehard K&oum...
—The implementation and optimization of collective communication operations is an important field of active research. Such operations directly influence application performance...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
Abstract—We discuss issues in designing sparse (nearest neighbor) collective operations for communication and reduction operations in small neighborhoods for the Message Passing ...
Abstract. We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-proc...
Philip Mucci, Daniel Ahlin, Johan Danielsson, Per ...