— Bit and subword permutations are useful in many multimedia and cryptographic applications. New shift and permute instructions have been added to the instruction set of general-purpose microprocessors to efficiently implement the required data permutations. In this paper, the design of a high speed bit permutation unit is examined. The proposed architecture has been derived by mapping the functionality of one of the most powerful bit permutation instructions (GRP) to a new enhanced bitonic sorting network. The proposed design achieves delay reductions more than 20% when compared with previously presented solutions, while its regularity enables efficient VLSI implementations.