The large design space of modern computer architectures calls for performance modelling tools to facilitate the evaluation of different alternatives. In this paper, we give an ove...
Multiple instruction issue processors place high demands on register file bandwidth. One solution to reduce this bottleneck is the use of multiple register files. Register allocat...
David J. Kolson, Alexandru Nicolau, Nikil D. Dutt,...
Stream compaction is a common parallel primitive used to remove unwanted elements in sparse data. This allows highly parallel algorithms to maintain performance over several proce...
We present a new software technology for on-line performance analysis and visualization of complex parallel and distributed systems. Often heterogeneous, these systems need capabi...
Aleksandar M. Bakic, Matt W. Mutka, Diane T. Rover
Global addressing of shared data simplifies parallel programming and complements message passing models commonly found in distributed memory machines. A number of programming sys...
Beng-Hong Lim, Chi-Chao Chang, Grzegorz Czajkowski...