We propose two new instructions, swperm and sieve, that can be used to efficiently complete an arbitrary bit-level permutation of an n-bit word with or without repetitions. Permutations with repetitions are rearrangements of an ordered set in which elements may replace other elements in the set; such permutations are useful in cryptographic algorithms. On a 4-way superscalar processor, an arbitrary 64-bit permutation with repetitions of 1-bit subwords can be completed in 11 instructions and only 4 cycles using the two proposed instructions. For subwords of size 4 bits or greater, an arbitrary permutation with repetitions of a 64-bit register can be completed in a single cycle using a single swperm instruction. This improves upon previous permutation instruction proposals that require log(r) sequential instructions to permute r subwords of a 64-bit word without repetitions. Our method requires fewer instructions to permute 4-bit or larger subwords packed in a 64-bit register and fewer ...
John Patrick McGregor, Ruby B. Lee