Clustered machines partition hardware resources to circumvent the cycle time penalties incurred by large, monolithic structures. This partitioning introduces a long inter-cluster ...
A critical problem in wide-issue superscalar processors is the limit on cycle time imposed by the central register file and operand bypass network. In this paper, a distributed re...
Santithorn Bunchua, D. Scott Wills, Linda M. Wills
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, ...
Subbarao Palacharla, Norman P. Jouppi, James E. Sm...
Fast, accurate, and effective performance analysis is essential for the design of modern processor architectures and improving application performance. Recent trends toward highly...
Ramadass Nagarajan, Xia Chen, Robert G. McDonald, ...
As semiconductor feature sizes decrease, interconnect delay is becoming a dominant component of processor cycle times. This creates a critical need to shift microarchitectural des...