The wavelet transform hierarchically decomposes images with prescribed bases, while multilineal models search for optimal bases to adapt visual data. In this paper, we integrate these two approaches to compactly represent 2D images and 3D volume data. Once a wavelet (packet) decomposition has been performed, the coef cients are subdivided into small blocks most of which have small energy and are pruned. Surviving blocks usually exhibit strong redundancy among different channels and subbands. To exploit this property, we organize the surviving blocks into small tensors, group the tensors into clusters using an EM algorithm, and compactly approximate each cluster using tensor ensemble approximation. Experimental results on images and medical volume data indicate that our approach achieves better approximation quality than wavelet (packet) transforms.