Image processing applications tend to access their data non-sequentially and reuse that data infrequently. As a result, they tend to perform poorly on conventional memory systems ...
Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sall...
This paper focuses on the problem of how to find and effectively exploit speculative thread-level parallelism. Our studies show that speculating only on loops does not yield suffi...
Jeffrey T. Oplinger, David L. Heine, Monica S. Lam
This paper explores microarchitecture models for a simultaneous multithreaded processor with multimedia enhancements. We enhance a wide-issue superscalar processor by the simultan...
Existing techniques can enhance the locality of arrays indexed by affine functions of induction variables. This paper presents a technique to localize non-affine array references,...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural m...
Lori Carter, Beth Simon, Brad Calder, Larry Carter...
In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the fl...
Ramon Canal, Joan-Manuel Parcerisa, Antonio Gonz&a...
This paper presents the Cameron Project 1 , which aims to provide a high level, algorithmic language and optimizing compiler for the development of image processing applications o...