Complex arithmetic computations, especially if derived from bit-level software descriptions, can be very inefficient if implemented directly in hardware (e.g., by translation of the relevant C section in VHDL or Verilog). In this paper we show that known arithmetic optimisation techniques are in some cases insufficient to achieve the high-performance implementation that a designer could produce through an attentive study of the computation. We therefore introduce an algorithm to restructure dataflow graphs so that they can be synthesized in high-quality arithmetic circuits, especially when arithmetic operations are interspersed with logic operations. On typical software benchmarks, the new technique reduces the critical path by around 20
Paolo Ienne, Ajay K. Verma