Because data races represent a hard-to-manage class of errors in concurrent programs, numerous approaches to detect them have been proposed and evaluated. We specifically consider...
Paruj Ratanaworabhan, Martin Burtscher, Darko Kiro...
A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all stre...
Xuejun Yang, Li Wang, Jingling Xue, Yu Deng, Ying ...
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
In this paper, we present a thorough analysis of thread-level parallelism available in production High Performance Computing (HPC) codes. We survey a number of techniques that are...
We have developed a transaction-based approach to distributed shared memory(DSM) that supports object caching and generates path expression prefetches. A path expression specifies...
This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operati...
Matthew D. Allen, Srinath Sridharan, Gurindar S. S...
Transactional Memory (TM) is being studied widely as a new technique for synchronizing concurrent accesses to shared memory data structures for use in multi-core systems. Much of ...
The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-bas...
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...