Pipelined wavefront computations are a ubiquitous class of parallel algorithm used for the solution of a num ber of scientific and engineering applications. This paper investig...
Gihan R. Mudalige, Simon D. Hammond, J. A. Smith, ...
Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. MPI is a widely used mess...
—Transactional Memory (TM) takes responsibility for concurrent, atomic execution of labeled regions of code, freeing the programmer from the need to manage locks. Typical impleme...
Michael F. Spear, Michael Silverman, Luke Dalessan...
The increasing availability of multi-core and multiprocessor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Mont...
Jonathan M. R. Byrd, Stephen A. Jarvis, A. H. Bhal...
Current processors exploit out-of-order execution and branch prediction to improve instruction level parallelism. When a branch prediction is wrong, processors flush the pipeline ...