Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequenc...
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris...
As the size of FPGA devices grows following Moore’s law, it becomes possible to put a complete manycore system onto a single FPGA chip. The centralized memory hierarchy on typica...
Sen Ma, Miaoqing Huang, Eugene Cartwright, David L...
We describe two novel constructs for programming parallel machines with multi-level memory hierarchies: call-up, which allows a child task to invoke computation on its parent, and...
Michael Bauer, John Clark, Eric Schkufza, Alex Aik...
In recent years, the increasing number of processor cores and limited increases in main memory bandwidth have led to the problem of the bandwidth wall, where memory bandwidth is b...
Guangyu Sun, Christopher J. Hughes, Changkyu Kim, ...
Abstract. As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel program...
Kyle Spafford, Jeremy S. Meredith, Jeffrey S. Vett...
The speed of many computations is limited not by the number of arithmetic operations but by the time it takes to move and rearrange data in the increasingly complicated memory hie...
Strassen’s matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen’s reduces the total number of operations. Strasse...