The floating point unit of the next generation PowerPC is detailed. It has been tested at over 5 GHz. The design supports an extremely aggressive cycle time of 13 FO4 using a technology independent measure. For most dependent instructions, its fused multiply-add dataflow has only 6 effective pipeline stages. This is nearly equivalent to its predecessor, the Power 5, even though its technology independent frequency has increased over 70%. Overall the frequency has improved over 100%. It achieves this high performance through aggressive feedback paths, circuit design and layout. The pipeline has 7 stages but data may be fed back to dependent operations prior to rounding and complete normalization. Division and square root algorithms are also described which take advantage of high-precision linear approximation hardware for obtaining a reciprocal or reciprocal square root approximation.
Son Dao Trong, Martin S. Schmookler, Eric M. Schwa