An accurate, tractable, analytic cache model for time-shared systems is presented, which estimates the overall cache missrate of a multiprocessing system with any cache size and t...
The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache p...
A large logical register file is important to allow effective compiler transformations or to provide a windowed space of registers to allow fast function calls. Unfortunately, a l...
Matt Postiff, David Greene, Steven E. Raasch, Trev...
Recent proposals for Chip Multiprocessors (CMPs) advocate speculative, or implicit, threading in which the hardware employs prediction to peel off instruction sequences (i.e., imp...
Chong-liang Ooi, Seon Wook Kim, Il Park, Rudolf Ei...
We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operation...
Andreas Moshovos, Dionisios N. Pnevmatikatos, Amir...
For most parallel and high performance systems, tuning guides provide the users with advices to optimize the execution time of their programs. Execution time may be very sensitive...