The Complex Streamed Instruction (CSI) set is an instruction set extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory and the streams can be of any length. Sectioning (the process of splitting up arbitrary-length streams into fixed-size sections that fit in a vector register), data alignment, and conversion between different packed data types are all performed in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MMX and VIS. This paper presents a detailed design of a unit that can execute CSI instructions under the assumption that it is interfaced with the first-level data cache. In particular, it is shown that the complex, two-dimensional, address-generation calculations can be performed in a pipelined fashion and implemented using a three-stage pipeline with acceptable delay and hardware cost. Ó 2003 Elsevier B.V. All rights reserved.
Dmitry Cheresiz, Ben H. H. Juurlink, Stamatis Vass