In this paper we describe techniques for compiling finegrained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Pr...
John A. Stratton, Vinod Grover, Jaydeep Marathe, B...
Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although c...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Instruction scheduling is an important compiler technique for exploiting more instruction-level parallelism (ILP) in high-performance microprocessors, and in this paper, we study ...
Modern multi-core architectures have become popular because of the limitations of deep pipelines and heating and power concerns. Some of these multi-core architectures such as the...