In cross-modal inference, we estimate complete fields from noisy and missing observations of one sensory modality using structure found in another sensory modality. This inference problem occurs in several areas including texture reconstruction and reconstruction of geophysical fields. We propose a method for cross-modal inference that simultaneously learns shape recipes between two modalities and estimates missing information by using a prior on image structure gleaned from the alternate modality. In the absence of a physical basis for representing image priors, we use a statistical one that represents correlations in differential features. This is done efficiently using a perturbation sampling scheme. Using just one example of the alternate modality, we produce a factorized ensemble representation of feature correlations that yields efficient solutions to large-sized spatial inference problems. We demonstrate the utility of this approach on cross-modal inference with depth and spect...
S. Ravela, Antonio B. Torralba, William T. Freeman