Hybrid branch predictors combine the predictions of multiple single-level or two-level branch predictors. The prediction-combining hardware -- the "meta-predictor" -may ...
Dirk Grunwald, Donald C. Lindsay, Benjamin G. Zorn
In order to achieve practical efficient execution on a parallel architecture, a knowledge of the data dependencies related to the application appears as the key point for building...
Determination of data dependences is a task typically performed with high-level language source code in today's optimizing and parallelizing compilers. Very little work has b...
Wolfram Amme, Peter Braun, Eberhard Zehendner, Fra...
Compile-time scheduling is one approach to extract parallelism which has proved effective when the execution behavior is predictable. Unfortunately, the performance of most priori...
The SGI Origin 2000 is designedto support a wide range of applications and has low local and remote memory latencies. However, it often has a high ratio of remote to local misses....
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
Abstract. With the growing importance of fast system area networks in the parallel community, it is becoming common for message passing programs to run in multi-programming environ...
Frederick C. Wong, Andrea C. Arpaci-Dusseau, David...
This paper describes a first approach to implement MPI-2’s Extended Collective Operations. We aimed to ascertain the feasibility and effectiveness of such a project based on exis...
One-sided Communications is one of the extensions to MPI set out in the MPI-2 standard. We present here a thread-based implementation of One-sided Communications written for WMPI, ...
In this paper we describe the difficulties inherent in making accurate, reproducible measurements of message-passing performance. We describe some of the mistakes often made in att...