Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On some applications more than 20%...
With the current trend toward multicore architectures, improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), ...
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...
To sustain emerging data-intensive scientific applications, High Performance Computing (HPC) centers invest a notable fraction of their operating budget on a specialized fast sto...
Henry M. Monti, Ali Raza Butt, Sudharshan S. Vazhk...