GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly progr...
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarj...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
Abstract: Existing multicore systems already provide deep levels of thread parallelism. Hybrid programming models and composability of parallel libraries are very active areas of r...
Costin Iancu, Steven Hofmeyr, Filip Blagojevic, Yi...