On-chip multiprocessor can be an alternative to the wide-issue superscalar processor approach which is currently the mainstream to exploit the increasing number of transistors on a silicon chip. Utilization of the cache, especially for the remote data is important in the system using such on-chip multiprocessors since the ratio of the off-chip and the on-chip memory access latencies is higher than traditional board-level implementation of the cache coherent non-uniform memory access (CC-NUMA) multiprocessors. We examine two options to utilize the cache resource of the on-chip multiprocessors whose size is restrained by the die area: (1) the instruction and/or private data are only cached at the L1 cache to leave more space on the L2 cache for the shared data; (2) divide cache area into the L2 and the remote victim caches or use all the area for the L2 cache. Results of execution-driven simulations show that the ®rst option improved the performance up to 15%. For the second option, a ...
Hitoshi Oi, N. Ranganathan