To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. ...
William Jalby, Christophe Lemuet, Sid Ahmed Ali To...
Recent superscalar processors issue four instructions per cycle. These processors are also powered by highly-parallel superscalar cores. The potential performance can only be expl...
Thomas M. Conte, Kishore N. Menezes, Patrick M. Mi...
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Types play an important role both in reasoning about Haskell and for its implementation. For example, the Glasgow Haskell Compiler performs certain fusion transformations that are...
Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data...