We studied the dynamic instruction count reduction for a single-thread, vectorized and a multi-threaded, non-vectorized, MPEG-4 video encoder. Results indicate a maximum improvement of the order of 88% for 22 CPU contexts for the multi-threaded case whereas the single-thread, vectorized version demonstrates an 85% improvement for a vector register file length of 24 bytes, over the scalar case. We present VLSI macrocells of a vector accelerator implementing a subset of the MPEG-4 vector ISA and a 2-way, parametric, bus-based, cache coherent, SoC multi-processor.
Tom R. Jacobs, José L. Núñez-