With shrinking feature size of silicon fabrication technology, architects are putting more and more logic into a single die. While one might opt to use these transistors for build...
Lin Li, Narayanan Vijaykrishnan, Mahmut T. Kandemi...
Modern embedded processors use data caches with higher and higher degrees of associativity in order to increase performance. A set–associative data cache consumes a significant...
Dan Nicolaescu, Alexander V. Veidenbaum, Alexandru...
In this paper we propose a design technique to pipeline cache memories for high bandwidth applications. With the scaling of technology cache access latencies are multiple clock cy...
Multitasked real-time systems often employ caches to boost performance. However the unpredictable dynamic behavior of caches makes schedulability analysis of such systems difficul...
Cache optimizations typically include code transformations to increase the locality of memory accesses. An orthogonal approach is to enable for latency hiding by introducing prefet...
Cache memories are mandatory to bridge the growing gap between CPU speed and main memory access time. Standard cache organizations improve the average execution time but are diffi...
Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and...
In this paper, two tools are presented: an execution driven cache simulator which relates event metrics to a dynamically built-up call-graph, and a graphical front end able to visu...
Josef Weidendorfer, Markus Kowarschik, Carsten Tri...
A cache oblivious matrix transposition algorithm is implemented and analyzed using simulation and hardware performance counters. Contrary to its name, the cache oblivious matrix tr...
D. Tsifakis, Alistair P. Rendell, Peter E. Strazdi...
Abstract. For many applications, cache misses are the primary performance bottleneck. Even though much research has been performed on automatically optimizing cache behavior at the...