Abstract. Many applications with large data spaces that cannot run on a typical workstation (due to page faults) call for techniques to expand the effective memory size. One such t...
In order to meet the requirements concerning both performance and energy consumption in embedded systems, new memory architectures are being introduced. Beside the well-known use o...
Abstract. In this paper, we propose a processor architecture with programmable on-chip memory for a high-performance SMP (symmetric multi-processor) node named SCIMA-SMP (Software ...
Abstract. Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memor...
Stephen Somogyi, Thomas F. Wenisch, Nikolaos Harda...
The size and speed of first-level caches and SRAMs of embedded processors continue to increase in response to demands for higher performance. In power-sensitive devices like PDAs a...
During a program’s runtime, the stack and data segments of the main memory often contain much redundancy, which makes them good candidates for compression. Compression and decomp...
High-performance out-of-order processors spend a significant portion of their execution time on the incorrect program path even though they employ aggressive branch prediction al...
Onur Mutlu, Hyesoon Kim, David N. Armstrong, Yale ...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
User-controllable coherence revives the idea of cooperation between software and hardware in an attempt to bridge the gap between efficient small-scale shared memory machines and m...
Abstract. The large and growing impact of memory hierarchies on overall system performance compels designers to investigate innovative techniques to improve memory-system efficienc...