FPGAs have reached densities that can implement floatingpoint applications, but floating-point operations still require a large amount of FPGA resources. One major component of IEEE compliant floating-point computations is variable length shifters. They account for over 30% of a doubleprecision floating-point adder and 25% of a doubleprecision multiplier. This paper introduces two alternatives for implementing these shifters. One alternative is a coarsegrained approach: embedding variable length shifters in the FPGA fabric. These units provide significant area savings with a modest clock rate improvement over existing architectures. Another alternative is a fine-grained approach: adding a 4:1 multiplexer inside the slices, in parallel to the LUTs. While providing a more modest area savings, these multiplexers provide a significant boost in clock rate with a small impact on the FPGA fabric.
Michael J. Beauchamp, Scott Hauck, Keith D. Underw