Conventional microarchitectures choose a single memory hierarchy design point targeted at the average application. In this paper, we propose a cache and TLB layout and design that...
Rajeev Balasubramonian, David H. Albonesi, Alper B...
Data, addresses, and instructions are compressed by maintaining only significant bytes with two or three extension bits appended to indicate the significant byte positions. This s...
Dynamically Linked Libraries (DLLs) promote software modularity, portability, and flexibility and their use has become widespread. In this paper, we characterize the behavior of f...
Stevan A. Vlaovic, Edward S. Davidson, Gary S. Tys...
Dynamic Zero Compression reduces the energy required for cache accesses by only writing and reading a single bit for every zero-valued byte. This energy-conscious compression is i...
A machine’s performance is the product of its IPC (Instructions Per Cycle) and clock frequency. Recently, Palacharla, Jouppi, and Smith [3] warned that the dynamic instruction s...
An effective method for reducing the effect of load latency in modern processors is data prefetching. One form of data prefetching, stream buffers, has been shown to be particular...
Selective dynamic compilation systems, typically driven by annotations that identify run-time constants, can achieve significant program speedups. However, manually inserting ann...
A slipstream processor reduces the length of a running program by dynamically skipping computation non-essential for correct forward progress. The shortened program runs faster as...
Zachary Purser, Karthik Sundaramoorthy, Eric Roten...
Register promotion is an optimization that allocates a value to a register for a region of its lifetime where it is provably not aliased. Conventional compiler analysis cannot alw...