By representing images and image prototypes by linear subspaces spanned by "tangent vectors" (derivatives of an image with respect to translation, rotation, etc.), impressive invariance to known types of uniform distortion can be built into feedforward discriminators. We describe a new probability model that can jointly cluster data and learn mixtures of nonuniform, smooth deformation fields. Our fields are based on low-frequency wavelets, so they use very few parameters to model a wide range of smooth deformations (unlike, e.g., factor analysis, which uses a large number of parameters to model deformations). In spirit, our ideas are most similar to the idea of separating content from style published by Tenenbaum and Freeman. However, our models do not need labeled data for training, and thus allow for unsupervised separation of appearance from deformation. We give results on handwritten digit recognition and face recognition.
Nebojsa Jojic, Patrice Simard, Brendan J. Frey, Da