Current superscalar processors, both RISC and CISC, require substantial instruction fetch and decode bandwidth to keep multiple functional units utilized. While CISC instructions can sometimes provide reduced fetch bandwidth requirements, they are correspondingly more dicult to decode. A hardware assist, called a ll unit, can dynamically collect decoded microoperations into a decoded instruction cache. Future code fetches to those locations can be satised out of this cache and thus bypass the decoding logic. This approach is investigated using the Intel x86 architecture, and a speedup of approximately a factor of two over a P6-like decoding structure is seen for the three SPEC benchmarks investigated. This design is accompanied by a microengine-register allocation and renaming scheme that prevents the increased supply of microoperations from placing excessive demands on the normal register renaming hardware.