Contemporary microprocessors implement many iterative algorithms. For example, the front-end of a microprocessor repeatedly fetches and decodes instructions while updating interna...
Mark Aagaard, Robert B. Jones, Roope Kaivola, Kath...
For some sequential loops, existing techniques that form speculative threads only at their loop boundaries do not adequately expose the speculative parallelism inherent in them. T...
Lin Gao 0002, Lian Li 0002, Jingling Xue, Tin-Fook...
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several lim...
Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level paral...
—High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to minimize cache interference by improving the layout of ...