The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...
With the increasing performance gap between the processor and the memory, the importance of caches is increasing for high performance processors. However, with reducing feature si...
Increasing cache latencies limit L1 cache sizes. In this paper we investigate restrictive compression techniques for level 1 data cache, to avoid an increase in the cache access l...
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...