This paper presents the Alpha EV8 conditional branch predictor. The Alpha EV8 microprocessor project, canceled in June 2001 in a late phase of development, envisioned an aggressiv...
The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the per...
Fast networks have made it possible to coordinate distributed heterogeneous CPU, memory, and storage resources to provide a powerful platform for executing high-performance applic...
Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and pe...
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...