Sciweavers

PPOPP
2009
ACM
14 years 12 months ago
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
PPOPP
2009
ACM
14 years 12 months ago
Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs
In this paper, we present a thorough analysis of thread-level parallelism available in production High Performance Computing (HPC) codes. We survey a number of techniques that are...
Arun Kejariwal, Calin Cascaval
PPOPP
2009
ACM
14 years 12 months ago
Software transactional distributed shared memory
We have developed a transaction-based approach to distributed shared memory(DSM) that supports object caching and generates path expression prefetches. A path expression specifies...
Alokika Dash, Brian Demsky
PPOPP
2009
ACM
14 years 12 months ago
Serialization sets: a dynamic dependence-based parallel execution model
This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operati...
Matthew D. Allen, Srinath Sridharan, Gurindar S. S...
PPOPP
2009
ACM
14 years 12 months ago
Atomic quake: using transactional memory in an interactive multiplayer game server
Transactional Memory (TM) is being studied widely as a new technique for synchronizing concurrent accesses to shared memory data structures for use in multi-core systems. Much of ...
Adrián Cristal, Eduard Ayguadé, Fera...
PPOPP
2009
ACM
14 years 12 months ago
Mapping parallelism to multi-cores: a machine learning based approach
The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-bas...
Zheng Wang, Michael F. P. O'Boyle
PPOPP
2009
ACM
14 years 12 months ago
A compiler-directed data prefetching scheme for chip multiprocessors
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
PPOPP
2009
ACM
14 years 12 months ago
Effective performance measurement and analysis of multithreaded applications
Understanding why the performance of a multithreaded program does not improve linearly with the number of cores in a sharedmemory node populated with one or more multicore process...
Nathan R. Tallent, John M. Mellor-Crummey
PPOPP
2009
ACM
14 years 12 months ago
Parallel thinking
Guy E. Blelloch