Sciweavers

ICS
2009
Tsinghua U.
14 years 7 months ago
High-performance CUDA kernel execution on FPGAs
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...
Alexandros Papakonstantinou, Karthik Gururaj, John...
ICS
2009
Tsinghua U.
14 years 7 months ago
Designing multi-socket systems using silicon photonics
Future single-board multi-socket systems may be unable to deliver the needed memory bandwidth electrically due to power limitations, which will hurt their ability to drive perform...
Scott Beamer, Krste Asanovic, Christopher Batten, ...
ICS
2009
Tsinghua U.
14 years 7 months ago
MPI-aware compiler optimizations for improving communication-computation overlap
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...
Anthony Danalis, Lori L. Pollock, D. Martin Swany,...
ICS
2009
Tsinghua U.
14 years 7 months ago
Fast and scalable list ranking on the GPU
General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest perfo...
M. Suhail Rehman, Kishore Kothapalli, P. J. Naraya...
ICS
2009
Tsinghua U.
14 years 7 months ago
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
Jiayuan Meng, Kevin Skadron
ICS
2009
Tsinghua U.
14 years 7 months ago
Cancellation of loads that return zero using zero-value caches
The speed gap between processor and memory continues to limit performance. To address this problem, we explore the potential of eliminating Zero Loads—loads accessing memory loc...
Md. Mafijul Islam, Sally A. McKee, Per Stenstr&oum...
ICS
2009
Tsinghua U.
14 years 7 months ago
High-performance regular expression scanning on the Cell/B.E. processor
Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in ever...
Daniele Paolo Scarpazza, Gregory F. Russell
ICS
2009
Tsinghua U.
14 years 7 months ago
Parametric multi-level tiling of imperfectly nested loops
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multilevel tiled code is essential for maximizing da...
Albert Hartono, Muthu Manikandan Baskaran, C&eacut...