Widespread use of wavelet transforms as in JPEG2000 demands efficient implementations on general purpose computers as well as dedicated hardware. The increasing availability of SIMD technologies is a great challenge since efficient SIMD parallelizations are not trivial. This work presents a parallelized 2-D wavelet transform following a single-loop approach, i.e. a loop fusion of the lifting steps of horizontal filtering, and interleaving horizontal and vertical filtering for optimal temporal locality. In this way, each input value is read only once and each output value is written once without subsequent updates. Such an approach turns out to be a necessary basis for an efficient SIMD parallelization. Results are obtained on a general purpose processor with a 4-fold single-precision SIMD extension. Speedups of about 3.7 due to the use of SIMD, 2.55 due to the single-loop approach and up to 6 due to cache effects for pathologic data sizes are obtained, giving total speedups of up...