We derive a recursive general-radix pruned Cooley-Tukey fast Fourier transform (FFT) algorithm in Kronecker product notation. The algorithm is compatible with vectorization and pa...
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast ...
In efforts to overcome the complexity of the syntax and the lack of formal semantics of conventional hardware description languages, a number of functional hardware description la...
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full de nition of BSPl...
Jonathan M. D. Hill, Bill McColl, Dan C. Stefanesc...
- Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multi...
Andrew Sohn, Yuetsu Kodama, Jui Ku, Mitsuhisa Sato...