Recent work on unsupervised feature learning has shown that learning on polynomial expansions of input patches, such as on pair-wise products of pixel intensities, can improve the performance of feature learners and extend their applicability to spatio-temporal problems, such as human action recognition or learning of image transformations. Learning of such higher order features, however, has been much more difficult than standard dictionary learning, because of the high dimensionality and because standard learning criteria are not applicable. Here, we show how one can cast the problem of learning higher-order features as the problem of learning a parametric family of manifolds. This allows us to apply a variant of a de-noising autoencoder network to learn higher-order features using simple gradient based optimization. Our experiments show that the approach can outperform existing higher-order models, while training and inference are exact, fast, and simple.