Abstract--On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When finegranularity of data communication and synchronization is requ...
Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir,...
Data consistency is very desirable because strong semantic properties make it easier to write correct programs that perform as users expect. However, there are good reasons why co...
Recently, graphics processing units, or GPUs, have become a viable alternative as commodity, parallel hardware for generalpurpose computing, due to their massive data-parallelism,...
Ke Yang, Bingsheng He, Rui Fang, Mian Lu, Naga K. ...
As high-end computing systems continue to grow in scale, the performance that applications can achieve on such large scale systems depends heavily on their ability to avoid explic...
Gopalakrishnan Santhanaraman, Pavan Balaji, K. Gop...
The MPI standard provides tool builders with an efficient profiling interface, PMPI. Although many tools have successfully used this interface, it has three major drawbacks: a n...