Usual cache optimisation techniques for high performance computing are difficult to apply in embedded VLIW applications. First, embedded applications are not always well structur...
Samir Ammenouche, Sid Ahmed Ali Touati, William Ja...
The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache p...
The load/store queue (LSQ) is one of the most complex parts of contemporary processors. Its latency is critical for the processor performance and it is usually one of the processo...
Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures , such as the Intel P6, the ROB also serves as a repositor...
Gurhan Kucuk, Dmitry Ponomarev, Oguz Ergin, Kanad ...
Sparse matrix-vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describes techniques that increase instruction-...