We propose a technique for reducing the energy required by rmware code to execute on embedded systems. The method is based on the idea of compressing the most commonly executed instructions so as to reduce the energy dissipated in memory accesses. Instruction decompression is performed on the y by a hardware module located between processor and memory: No changes to the processor architecture are required. Hence, our technique is well-suited for systems employing IP cores whose internal architecture cannot be modi ed. We describe a number of decompression schemes and architectures that e ectively trade o hardware complexity for memory energy and bandwidth reduction, as proved by experimental data collected by executing several sample programs.