Abstract. Profiling is often the method of choice for performance analysis of parallel applications due to its low overhead and easily comprehensible results. However, a disadvanta...
Abstract. We describe compiler and run-time optimisations for effective autoparallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by anno...
This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
One of the keys for the success of parallel processing is the availability of high-level programming languages for on-the-shelf parallel architectures. Using explicit message passi...
A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser a...
Jun Shirako, David M. Peixotto, Vivek Sarkar, Will...