Address correlation is a technique that links the addresses that reference the same data values. Using a detailed source-code level analysis, a recent study [1] revealed that diffe...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
Clustered microarchitectures are an effective organization to deal with the problem of wire delays and complexity by partitioning some of the processor resources. The organization ...
With the increasing performance gap between the processor and the memory, the importance of caches is increasing for high performance processors. However, with reducing feature si...
While caches have become invaluable for higher-end architectures due to their ability to hide, in part, the gap between processor speed and memory access times, caches (and partic...
In many computer systems with large data computations, the delay of memory access is one of the major performance bottlenecks. In this paper, we propose an enhanced field remappi...
Keoncheol Shin, Jungeun Kim, Seonggun Kim, Hwansoo...
Active power used to be the primary contributor to total power dissipation of CMOS designs, but with the technology scaling, the share of leakage in total power consumption of dig...
Hamid Noori, Maziar Goudarzi, Koji Inoue, Kazuaki ...
We present an application-driven customization methodology for energy-efficient inter-core communication in embedded multiprocessors. The methodology leverages configurable cach...
In this paper, we propose a formal analysis approach to estimate the expected (average) data cache access time of an application across all possible program inputs. Towards this g...
Future embedded systems are expected to use chip-multiprocessors to provide the execution power for increasingly demanding applications. Multiprocessors increase the pressure on th...
Martin Schoeberl, Wolfgang Puffitsch, Benedikt Hub...