Numerical applications frequently contain nested loop structures that process large arrays of data. The execution of these loop structures often produces memory preference pattern...
Yoji Yamada, John Gyllenhall, Grant Haab, Wen-mei ...
This paper presents an examination of different cache and processor configurations assuming transistor densities will continue to increase as they have in the past. While in the s...
Matthew K. Farrens, Gary S. Tyson, Andrew R. Plesz...
Instruction scheduling in general, and software pipelining in particular face the di cult task of scheduling operations in the presence of uncertain latencies. The largest contrib...
This paper evaluates the bene t of adding a shared cache to the network interface as a means of improving the performance of networked workstations con gured as a distributed shar...
John K. Bennett, Katherine E. Fletcher, William Ev...
In this paper, we propose a cache design that provides the same miss rate as a two-way set associative cache, but with a access time closer to a direct-mapped cache. As with other...
3rd IEEE Real-time Technology and Applications Symposium (RTAS), June 1997 in Montreal, Canada Cache-partitioning techniques have been invented to make modern processors with an e...
Highly aggressive multi-issue processor designs of the past few years and projections for the next decade require that we redesign the operation of the cache memory system. The nu...
Jude A. Rivers, Gary S. Tyson, Edward S. Davidson,...
In this paper we evaluate the performance of high bandwidth caches that employ multiple ports, multiple cycle hit times, on-chip DRAM, and a line buffer to find the organization t...
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been...
Brad Calder, Chandra Krintz, Simmi John, Todd M. A...
Software-controlled cache prefetching and data forwarding are widely used techniques for tolerating memory latency in shared memory multiprocessors. However, some previous studies...