The significant growth in computational power of modern Graphics Processing Units(GPUs) coupled with the advent of general purpose programming environments like NVIDA's CUDA,...
Kishore Kothapalli, Rishabh Mukherjee, M. Suhail R...
Many parallel systems offer a simple view of memory: all storage cells are addresseduniformly. Despite a uniform view of the memory, the machines differsignificantly in theirmemo...
The Merrimac supercomputer uses stream processors and a highradix network to achieve high performance at low cost and low power. The stream architecture matches the capabilities o...
Mattan Erez, Jung Ho Ahn, Ankit Garg, William J. D...
Memory size reduction and memory accesses optimization are crucial issues for embedded systems. In the context of affine programs, these two challenges are classically tackled by ...
In this paper we propose a simple extension to the I/O architecture of scalable multiprocessors that optimizes page swap-outs significantly. More specifically, we propose the use o...