Sciweavers

ISCA
2002
IEEE

The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays

14 years 4 months ago
The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays
Microprocessor clock frequency has improved by nearly 40% annually over the past decade. This improvement has been provided, in equal measure, by smaller technologies and deeper pipelines. From our study of the SPEC 2000 benchmarks, we find that for a high-performance architecture implemented in 100nm technology, the optimal clock period is approximately 8 fan-out-of-four (FO4) inverter delays for integer benchmarks, comprised of 6 FO4 of useful work and an overhead of about 2 FO4. The optimal clock period for floatingpoint benchmarks is 6FO4. We find these optimal points to be insensitive to latch and clock skew overheads. Our study indicates that further pipelining can at best improve performance of integer programs by a factor of 2 over current designs. At these high clock frequencies it will be difficult to design the instruction issue window to operate in a single cycle. Consequently, we propose and evaluate a high-frequency design called a segmented instruction window.
M. S. Hrishikesh, Doug Burger, Stephen W. Keckler,
Added 15 Jul 2010
Updated 15 Jul 2010
Type Conference
Year 2002
Where ISCA
Authors M. S. Hrishikesh, Doug Burger, Stephen W. Keckler, Premkishore Shivakumar, Norman P. Jouppi, Keith I. Farkas
Comments (0)