On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance...
Many image and signal processing kernels can be optimized for performance consuming a reasonable area by doing loops parallelization with extensive use of pipelining. This paper p...
Zubair Nawaz, Thomas Marconi, Koen Bertels, Todor ...
This paper describes a methodology of creating a built-in diagnostic system of a System on Chip and experimental results of the system application on the AT94K FPSLIC with cores d...
The advantages of pattern-based programming have been well-documented in the sequential literature. However patterns have yet to make their way into mainstream parallel computing,...
Steven Bromling, Steve MacDonald, John Anvik, Jona...
Many compute-intensive applications generate single result values by accessing clusters of nearby points in grids of one, two, or more dimensions. Often, the performance of FGPA i...