Sciweavers

197 search results - page 30 / 40
» Detecting phases in parallel applications on shared memory a...
Sort
View
ASPLOS
2009
ACM
14 years 8 months ago
Commutativity analysis for software parallelization: letting program transformations see the big picture
Extracting performance from many-core architectures requires software engineers to create multi-threaded applications, which significantly complicates the already daunting task of...
Farhana Aleen, Nathan Clark
ISCA
2009
IEEE
136views Hardware» more  ISCA 2009»
14 years 2 months ago
ECMon: exposing cache events for monitoring
The advent of multicores has introduced new challenges for programmers to provide increased performance and software reliability. There has been significant interest in technique...
Vijay Nagarajan, Rajiv Gupta
ISHPC
2000
Springer
13 years 11 months ago
Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration
This paper describes transparent mechanisms for emulating some of the data distribution facilities offered by traditional data-parallel programming models, such as High Performance...
Dimitrios S. Nikolopoulos, Theodore S. Papatheodor...
ICS
2009
Tsinghua U.
14 years 2 months ago
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
Jiayuan Meng, Kevin Skadron
IPPS
2010
IEEE
13 years 5 months ago
Servet: A benchmark suite for autotuning on multicore clusters
Abstract--The growing complexity in computer system hierarchies due to the increase in the number of cores per processor, levels of cache (some of them shared) and the number of pr...
Jorge González-Domínguez, Guillermo ...