Level one cache normally resides on a processor’s critical path, which determines the clock frequency. Directmapped caches exhibit fast access time but poor hit rates compared w...
—This paper presents a design space exploration of a selective load value prediction scheme suitable for energyaware Simultaneous Multi-Threaded (SMT) architectures. A load value...
A microprocessor's performance is fundamentally limited by the rate at which it can resolve branch mispredictions. Control independence (CI) architectures look for useful con...
Kshitiz Malik, Mayank Agarwal, Sam S. Stone, Kevin...
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...
Software code caches help amortize the overhead of dynamic binary transformation by enabling reuse of transformed code. Since code caches contain a potentiallyaltered copy of ever...