Traditional software controlled data cache prefetching is often ineffective due to the lack of runtime cache miss and miss address information. To overcome this limitation, we imp...
Jiwei Lu, Howard Chen, Rao Fu, Wei-Chung Hsu, Bobb...
With faster CPU clocks and wider pipelines, all relevant microarchitecture components should scale accordingly. There have been many proposals for scaling the issue queue, registe...
Energy-efficient processor design is becoming more and more important with technology scaling and with high performance requirements. Supply-voltage scaling is an efficient way to...
Hai Li, Chen-Yong Cher, T. N. Vijaykumar, Kaushik ...
Single-event upsets from particle strikes have become a key challenge in microprocessor design. Techniques to deal with these transient faults exist, but come at a cost. Designers...
Shubhendu S. Mukherjee, Christopher T. Weaver, Joe...
This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cor...
Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, P...
Many dynamic optimization and/or binary translation systems hold optimized/translated superblocks in a code cache. Conventional code caching systems suffer from overheads when con...
Ensuring back-to-back execution of dependent instructions in a conventional out-of-order processor requires scheduling logic that wakes up and selects instructions at the same rat...
Abstract—High-speed routers often use commodity, fully-associative, TCAMs (Ternary Content Addressable Memories) to perform packet classification and routing (IP lookup). We prop...
A dynamic optimizer is a runtime software system that groups a program’s instruction sequences into traces, optimizes those traces, stores the optimized traces in a softwarebase...
The impact of pipeline length on both the power and performance of a microprocessor is explored both theoretically and by simulation. A theory is presented for a wide range of pow...