The development of high-performance libraries has become extraordinarily difficult due to multiple processor cores, vector instruction sets, and deep memory hierarchies. Often, t...
Abstract. We introduce a collection of high performance kernels for basic linear algebra. The kernels encapsulate small xed size computations in order to provide building blocks fo...
We present an implementation of general FFTs for graphics processing units (GPUs). Unlike most existing GPU FFT implementations, we handle both complex and real data of any size t...
This paper presents a parameterized soft core generator for the discrete Fourier transform (DFT). Reusable IPs of digital signal processing (DSP) kernels are important time-saving...
Grace Nordin, Peter A. Milder, James C. Hoe, Marku...
— This paper presents work-in-progress towards a C++ source-to-source translator that automatically seeks parallelisable code fragments and replaces them with code for a graphics...
Jay L. T. Cornwall, Olav Beckmann, Paul H. J. Kell...