An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark ...
As bioinformatics is an emerging application of high performance computing, this paper first evaluates the memory performance of several representative bioinformatics application...
Guangming Tan, Lin Xu, Shengzhong Feng, Ninghui Su...
In a previous paper we show how the FLAME methods and tools provide a solution to compute dense dense linear algebra operations on a multi-GPU platform with reasonable performance...
We show how the traditional protocol stack, such as TCP/IP, can be eliminated for socket based high speed communication within a cluster. The SCI shared memory interconnect is used...
This paper describes an e cient implementation of MPI on the Memory-Based Communication Facilities; Memory-Based FIFO is used for bu ering by the library, and Remote Write for comm...