Synchronization operations, such as fence and locking, are used in many parallel operations accessing shared memory. However, a process which is blocked waiting for a fence operat...
Darius Buntinas, Amina Saify, Dhabaleswar K. Panda...
In contrast to the conventional send/receive model, the one-way communication model--using Put and Synch--allows the decoupling of message transmission from synchronization. This ...
Mahmut T. Kandemir, U. Nagaraj Shenoy, Prithviraj ...
In this paper we present a multicost algorithm for the joint time scheduling of the communication and computation resources that will be used by a task. The proposed algorithm sel...
Kostas Christodoulopoulos, Nikolaos D. Doulamis, E...
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
This paper presents several parallel FFT algorithms with different degree of communication overhead for multiprocessors in Network-on-Chip(NoC) environment. Three different method...