Software-controlled data prefetching is a promising technique for improving the performance of the memory subsystem to match today's high-performance processors. While prefet...
Access/execute architectures have several advantages over more traditional architectures. Because address generation and memory access are decoupled from operand use, memory laten...
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applicatio...
Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synthesis of large-scale ASIC systems in high-level synthesis. This paper presents SOM...
In this paper we investigate the benefit of scheduling non-critical loads for a higher latency during software pipelining. "Noncritical" denotes those loads that have s...
Sebastian Winkel, Rakesh Krishnaiyer, Robyn Sampso...