Sciweavers

619 search results - page 12 / 124
» Programming Distributed Memory Sytems Using OpenMP
Sort
View
SC
2000
ACM
13 years 12 months ago
Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling
The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP cluster...
D. S. Henty
PDP
2009
IEEE
14 years 2 months ago
A Parallel Implementation of the 2D Wavelet Transform Using CUDA
There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. The...
Joaquín Franco, Gregorio Bernabé, Ju...
EUROPAR
2007
Springer
13 years 11 months ago
On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications
Abstract. Profiling is often the method of choice for performance analysis of parallel applications due to its low overhead and easily comprehensible results. However, a disadvanta...
Karl Fürlinger, Michael Gerndt, Jack Dongarra
IPPS
2005
IEEE
14 years 1 months ago
Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs
Hyperthreaded (HT) and simultaneous multithreaded (SMT) processors are now available in commodity workstations and servers. This technology is designed to increase throughput by e...
Yun Zhang, Michael Voss
JPDC
2008
167views more  JPDC 2008»
13 years 7 months ago
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly progr...
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarj...