In order to produce MPI applications that perform well on today’s parallel architectures, programmers need effective tools for collecting and analyzing performance data. Because ...
Shirley Moore, David Cronk, Kevin S. London, Jack ...
Jackal is a fine-grained distributed shared memory implementation of the Java programming language. Jackal implements Java’s memory model and allows multithreaded Java programs...
Ronald Veldema, Rutger F. H. Hofman, Raoul Bhoedja...
This paper presents a high-level approach for assessing the performance behavior of complex scientific applications running on a high-performance system through simulation. The pr...
Thomas Fahringer, Nicola Mazzocca, Massimiliano Ra...
The Merrimac supercomputer uses stream processors and a highradix network to achieve high performance at low cost and low power. The stream architecture matches the capabilities o...
Mattan Erez, Jung Ho Ahn, Ankit Garg, William J. D...
The complexity of parallel I/O systems lies in the deep I/O stack with many software layers and concurrent I/O request handling at multiple layers. This paper explores multi-layer...