Achieving good performance on a modern machine with a multi-level memory hierarchy, and in particular on a machine with software-managed memories, requires precise tuning of progr...
Manman Ren, Ji Young Park, Mike Houston, Alex Aike...
We derive a recursive general-radix pruned Cooley-Tukey fast Fourier transform (FFT) algorithm in Kronecker product notation. The algorithm is compatible with vectorization and pa...
The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory acc...
Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers,...
Automatic optimization of address offset assignment for DSP applications, which reduces the number of address arithmetic instructions to meet the tight memory size restrictions an...
Code restructuring compilers rely heavily on program analysis techniques to automatically detect data dependences between program statements. Dependences between statement instanc...
Johnnie Birch, Robert A. van Engelen, Kyle A. Gall...