Sciweavers

TMM
2008

Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors

13 years 11 months ago
Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors
Abstract--The 2-D Discrete Wavelet Transform (DWT) consumes up to 68% of the JPEG2000 encoding time. In this paper, we develop efficient implementations of this important kernel on general-purpose processors (GPPs), in particular the Pentium 4 (P4). Efficient implementations of the 2-D DWT on the P4 must address three issues. First, the P4 suffers from a problem known as 64K aliasing, which can degrade performance by an order of magnitude. We propose two techniques to avoid 64K aliasing which improve performance by a factor of up to 4.20. Second, a straightforward implementation of vertical filtering incurs many cache misses. Cache performance can be improved by applying loop interchange, but there will still be many conflict misses if the filter length exceeds the cache associativity. Two methods are proposed to reduce the number of conflict misses which provide
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Where TMM
Authors Asadollah Shahbahrami, Ben H. H. Juurlink, Stamatis Vassiliadis
Comments (0)