In retargeting loop-based code for multimedia instruction set extensions, a critical issue is that vector data types of mixed precision within a loop body complicate the paralleli...
This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and ...
Existing SIMD extensions cannot efficiently vectorize the histogram function due to memory collisions. We propose two techniques to avoid this problem. In the first, a hierarchi...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati...
We describe an approach to verifying bit-level pipelined machine models using a combination of deductive reasoning and decision procedures. While theorem proving systems such as AC...