This paper describes the architecture and implementation of a high-speed decompression engine for embedded processors. The engine is targeted to processors where embedded programs are stored in compressed form, and decompressed at runtime during instruction cache re ll. The decompression engine uses a unique asynchronous variable decompression rate architecture to process Hu man-encoded instructions. The resulting circuit is signi cantly smaller than comparable synchronous decoders, yet has a higher throughput rate than almost all existing designs. The 0.8 layout is all full-custom and contains predominantly dynamic domino logic. The top-level control, as well as several small state machines, are implemented using asynchronous logic. The design operates without a user-supplied clock. Simulations using Lsim show average throughput of 32 bits/45 ns on the output side, corresponding to about 480 Mbit/sec on the input side. The chip has been manufactured by MOSIS tests show that the async...
Martin Benes, Andrew Wolfe, Steven M. Nowick