Temporal prediction in standard video coding is performed in the spatial domain, where each pixel is predicted from a motioncompensated reconstructed pixel in a prior frame. This paper is premised on the realization that such standard prediction treats each pixel independently and ignores underlying spatial correlations, while transform-domain prediction would eliminate much of the spatial correlation before signal components (transform coefficients) are independently predicted. Moreover, the true temporal correlations emerge after signal decomposition, and vary considerably from low to high frequency components. This precise nature of the temporal dependencies is entirely masked in spatial domain prediction by the high temporal correlation coefficient ( 1) imposed on all pixels by the dominant low frequency components. We derive optimal transform-domain per-coefficient predictors for three main settings: basic inter-frame prediction; bi-directional prediction; and enhancement-layer ...