This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced bit manipulation instructions recently proposed: these include arbitrary bit permutations, bit scatter and bit gather instructions. The new functional unit’s datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units - shifter and mix - with a much more powerful one.
Yedidya Hilewitz, Ruby B. Lee