We propose a novel methodology to generate Application Specific Instruction Processors (ASIPs) including custom instructions. Our implementation balances performance and area requirements by making custom instructions reusable across similar pieces of code. In addition to arithmetic and logic operations, table look-ups within custom instructions reduce costly accesses to global memory. We present synthesis and cycle-accurate simulation results for six embedded benchmarks running on customised processors. Reusable custom instructions achieve an average 319% speedup with only 5% additional area. The maximum speedup of 501% for the Advanced Encryption Standard (AES) requires only 3.6% additional area.
Robert G. Dimond, Oskar Mencer, Wayne Luk