—Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate executi...
This paper describes a toolset, PACE, that provides detailed predictive performance information throughout the implementation and execution stages of an application. It is structur...
Darren J. Kerbyson, John S. Harper, Efstathios Pap...
Abstract—The increasing number of cores per node in highperformance computing requires an efficient intra-node MPI communication subsystem. Most existing MPI implementations rel...
In this paper, we propose a technique that uses an additional mini cache, the L0-Cache, located between the instruction cache I-Cache and the CPU core. This mechanism can provid...
Nikolaos Bellas, Ibrahim N. Hajj, Constantine D. P...
The size and complexity of current custom VLSI have forced the use of high-level programming languages to describe hardware, and compiler and synthesis technology bstract designs ...
Darin Petkov, Randolph E. Harr, Saman P. Amarasing...