The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays

14 years 6 months ago

Download www.cs.cmu.edu

Microprocessor clock frequency has improved by nearly 40% annually over the past decade. This improvement has been provided, in equal measure, by smaller technologies and deeper pipelines. From our study of the SPEC 2000 benchmarks, we ﬁnd that for a high-performance architecture implemented in 100nm technology, the optimal clock period is approximately 8 fan-out-of-four (FO4) inverter delays for integer benchmarks, comprised of 6 FO4 of useful work and an overhead of about 2 FO4. The optimal clock period for ﬂoatingpoint benchmarks is 6FO4. We ﬁnd these optimal points to be insensitive to latch and clock skew overheads. Our study indicates that further pipelining can at best improve performance of integer programs by a factor of 2 over current designs. At these high clock frequencies it will be difﬁcult to design the instruction issue window to operate in a single cycle. Consequently, we propose and evaluate a high-frequency design called a segmented instruction window.

M. S. Hrishikesh, Doug Burger, Stephen W. Keckler,

Real-time Traffic

Clock | Hardware | ISCA 2002 | Microprocessor Clock Frequency | Optimal Clock Period |

claim paper

Post Info
More Details (n/a)

Added	15 Jul 2010
Updated	15 Jul 2010
Type	Conference
Year	2002
Where	ISCA
Authors	M. S. Hrishikesh, Doug Burger, Stephen W. Keckler, Premkishore Shivakumar, Norman P. Jouppi, Keith I. Farkas

Comments (0)

Sciweavers

The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays

Clock | Hardware | ISCA 2002 | Microprocessor Clock Frequency | Optimal Clock Period |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers