Sciweavers

462 search results - page 8 / 93
» Is Data Distribution Necessary in OpenMP
Sort
View
EUROPAR
2007
Springer
13 years 11 months ago
On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications
Abstract. Profiling is often the method of choice for performance analysis of parallel applications due to its low overhead and easily comprehensible results. However, a disadvanta...
Karl Fürlinger, Michael Gerndt, Jack Dongarra
EUROPAR
2008
Springer
13 years 9 months ago
Compile-Time and Run-Time Issues in an Auto-Parallelisation System for the Cell BE Processor
Abstract. We describe compiler and run-time optimisations for effective autoparallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by anno...
Alastair F. Donaldson, Paul Keir, Anton Lokhmotov
IPPS
2010
IEEE
13 years 5 months ago
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
IPPS
2000
IEEE
13 years 12 months ago
Controlling Distributed Shared Memory Consistency from High Level Programming Languages
One of the keys for the success of parallel processing is the availability of high-level programming languages for on-the-shelf parallel architectures. Using explicit message passi...
Yvon Jégou
IPPS
2009
IEEE
14 years 2 months ago
Phaser accumulators: A new reduction construct for dynamic parallelism
A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser a...
Jun Shirako, David M. Peixotto, Vivek Sarkar, Will...