As technological process shrinks and clock rate increases, instruction caches can no longer be accessed in one cycle. Alternatives are implementing smaller caches (with higher mis...
This paper presents a new technique for efficient usage of small trace caches. A trace cache can significantly increase the performance of wide out-oforder processors, but to be e...
In this paper, we present a hierarchical Data Cache Architecture called DCA to effectively slash local interconnect traffic and thus boost the storage server performance. DCA is ...
The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache’s high impact on system performance and power, and because of th...
Conventional cache models are not suited for real-time parallel processing because tasks may flush each other’s data out of the cache in an unpredictable manner. In this way th...
Anca Mariana Molnos, Marc J. M. Heijligers, Sorin ...
In this paper, we investigate the impact of Tox and Vth on power performance trade-offs for on-chip caches. We start by examining the optimization of the various components of a s...
Robert Bai, Nam Sung Kim, Taeho Kgil, Dennis Sylve...
Simultaneous Multithreaded (SMT) processors improve the instruction throughput by allowing fetching and running instructions from several threads simultaneously at a single cycle....
Cache locality optimization is an efficient way for reducing the idle time of modern processors in waiting for needed data. This kind of optimization can be achieved either on the...
Memory management is a fundamental problem in computer architecture and operating systems. We consider a two-level memory system with fast, but small cache and slow, but large mai...
As transistors continue to scale down into the nanometer regime, device leakage currents are becoming the dominant cause of power dissipation in nanometer caches, making it essent...