This work investigates the use of nonlinear dependencies in natural image sequence statistics to learn higher-order structures in natural videos. We propose a two-layer model that learns variance correlation between linear ICA coefficients and present a novel nonlinear representation of natural videos. The first layer performs a linear mapping from pixel values to ICA coefficients. In doing so, the spatiotemporal dynamics in natural videos are decomposed into a set of bases each encoding ”independent motion.” By assuming that the nonlinear dependency of ICA coefficients takes the form of variance correlation, the second layer learns the joint distribution of ICA sources that captures how these independent bases co-activate. Experimental reow that the abstract representation correspond to various activation patterns of bases with similar motion, hence the term ”motion patterns.” Our model offers a novel description of higher-order structures in natural videos. We illustrate...