Many standardized hardware communication interfaces offer runtime flexibility and configurability at the cost of efficiency. An alternate approach is the use of a highly-effic...
Steve Ward, Karim Abdalla, Rajeev Dujari, Michael ...
—Data dependence analysis techniques are the main component of today’s strategies for automatic detection of parallelism. Parallelism detection strategies are being incorporate...
Mapping data to parallel computers aims at minimizing the execution time of the associated application. However, it can take an unacceptable amount of time in comparison with the ...
Nashat Mansour, Ravi Ponnusamy, Alok N. Choudhary,...
Shared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-pas...
Performance monitoring of large scale parallel computers creates a dilemma: we need to collect detailed information to find performance bottlenecks, yet collecting all this data ...
This paper examines the effectiveness of decoupling as an optimization technique for high-performance computer architectures. Decoupled access execute architectures are described,...
Peter L. Bird, Alasdair Rawsthorne, Nigel P. Topha...
: The EM-4 is a supercomputer that offers very fast inter processor communication and support for multi threading. In this paper we demonstrate that the EM-4, Together with an auto...