This site uses cookies to deliver our services and to ensure you get the best experience. By continuing to use this site, you consent to our use of cookies and acknowledge that you have read and understand our Privacy Policy, Cookie Policy, and Terms
The development of high-performance libraries has become extraordinarily difficult due to multiple processor cores, vector instruction sets, and deep memory hierarchies. Often, t...
Abstract. We introduce a collection of high performance kernels for basic linear algebra. The kernels encapsulate small xed size computations in order to provide building blocks fo...
We present an implementation of general FFTs for graphics processing units (GPUs). Unlike most existing GPU FFT implementations, we handle both complex and real data of any size t...
This paper presents a parameterized soft core generator for the discrete Fourier transform (DFT). Reusable IPs of digital signal processing (DSP) kernels are important time-saving...
Grace Nordin, Peter A. Milder, James C. Hoe, Marku...
— This paper presents work-in-progress towards a C++ source-to-source translator that automatically seeks parallelisable code fragments and replaces them with code for a graphics...
Jay L. T. Cornwall, Olav Beckmann, Paul H. J. Kell...