1 Table lookups are one of the most frequently-used operations in symmetric-key ciphers. Particularly in the newer algorithms such as the Advanced Encryption Standard (AES), we frequently find that the greatest fraction of the execution time is spent during table lookups, varying between 34% and 72% for the five representative ciphers we consider: AES, Blowfish, Twofish, MARS, and RC4. In order to accelerate and parallelize these table lookups, we describe a new parallel table lookup (ptlu) instruction. Our synthesis results indicate that such an instruction can be added to a basic RISC processor with no cycle time impact. We compare the performance of the ptlu instruction with the speedups available through more conventional architectural techniques such as multiple-issue execution. We find that the performance benefits of using the ptlu instruction can be far higher than increasing the number of instructions executed per cycle in superscalar or VLIW processors.
A. Murat Fiskiran, Ruby B. Lee