Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole rout...
Laxmi N. Bhuyan, Hu-Jun Wang, Ravi R. Iyer, Akhile...
PULC is a user-level communication library for workstation clusters. PULC provides a multi-user, multi-programming communication library for user level communication on top of high...
Joachim M. Blum, Thomas M. Warschko, Walter F. Tic...
Scheduling jobs on the IBM SP2 system is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order that the jobs ar...
This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machin...
Marcio Merino Fernandes, Josep Llosa, Nigel P. Top...