In this paper we investigate the benefit of scheduling non-critical loads for a higher latency during software pipelining. "Noncritical" denotes those loads that have s...
Sebastian Winkel, Rakesh Krishnaiyer, Robyn Sampso...
This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit ...
John H. Kelm, Daniel R. Johnson, Steven S. Lumetta...
Developing countries face significant challenges in network access, making even simple network tasks unpleasant. Many standard techniques—caching and predictive prefetching— ...
Recent research advocates address-correlating predictors to identify cache block addresses for prefetch. Unfortunately, address-correlating predictors require correlation data sto...
With the widening performance gap between processors and main memory, efficient memory accessing behavior is necessary for good program performance. Loop partition is an effective...