In many applications, modelling techniques are necessary which take into account the inherent variability of given data. In this paper, we present an approach to model class specific pattern variation based on tangent distance within a statistical framework for classification. The model is an effective means to explicitly incorporate invariance with respect to transformations that do not change class-membership like e.g. small affine transformations in the case of image objects. If no prior knowledge about the type of variability is available, it is desirable to learn the model parameters from the data. The probabilistic interpretation presented here allows us to view learning of the variational derivatives in terms of a maximum likelihood estimation problem. We present experimental results from two different real-world pattern recognition tasks, namely image object recognition and automatic speech recognition. On the US Postal Service handwritten digit recognition task, learning o...