This paper describes the architecture and the performance of a new programmable 16-bit Digital Signal Processor (DSP) engine. It is developed specifically for next generation wireless digital systems and speech applications. Besides providing a basic instruction set, similar to current day 16-bit DSP’s, it contains distinctive architectural features and unique instructions, which make the engine highly efficient for compute-intensive tasks such as vector quantization and Viterbi operations. The datapath contains two Multiply-Accumulate units and one ALU. The external memory bandwidth is kept to two data busses and two corresponding address busses. Still, the internal bus network is designed such that all three units are operating in parallel. This parallelism is reflected in the performance benchmarks. For example, an FIR filter of N taps will take N/2 instruction cycles compared to N for a general purpose 16-bit DSP, and it will require only half the number of memory accesses of...