The discrete wavelet Transform (DWT), as defined by the Image Compression Standard JPEG-2000, is one of the most time-consuming computations which cannot be efficiently executed on current hardware architectures. This paper presents and compares a number of new, different architectures for domain-specific arrays to efficiently implement various DWT algorithms. A number of different algorithms are mapped to demonstrate the flexibility of these new embedded configurable SoC architectures and their ability to support different implementations having different performance characteristics. Our results demonstrate up to 59 percent improvement to the previous work in literature.