In distributed hybrid computing systems, traditional sequential processors are loosely coupled with reconfigurable hardware for optimal performance. This loose coupling proves to...
HPL is a parallel Linpack benchmark package widely adopted in massive cluster system performance test. On HPL data layout among processors, a law to determine block size NB theoret...
Abstract. Blockwise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is f...
Frank K. H. A. Dehne, Wolfgang Dittrich, David A. ...
The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. Thi...
Surendra Byna, William D. Gropp, Xian-He Sun, Raje...
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...