Abstract. Image fusion in high-resolution aerial imagery poses a challenging problem due to fine details and complex textures. In particular, color image fusion by using virtual orthographic cameras offers a common representation of overlapping yet perspective aerial images. This paper proposes a variational formulation for a tight integration of redundant image data showing urban environments. We introduce an efficient wavelet regularization which enables a natural-appearing recovery of fine details in the images by performing joint inpainting and denoising from a given set of input observations. Our framework is first evaluated on a setting with synthetic noise. Then, we apply our proposed approach to orthographic image generation in aerial imagery. In addition, we discuss an exemplar-based inpainting technique for an integrated removal of non-stationary objects like cars.