In this paper we present a novel boosting algorithm for supervised learning that incorporates invariance to data transformations and has high generalization capabilities. While one can incorporate invariance by adding virtual samples to the data (e.g., by jittering), we adopt a much more efficient strategy and work along the lines of vicinal risk minimization and tangent distance methods. As in vicinal risk minimization, we incorporate invariance to data by applying anisotropic smoothing along the directions of invariance. Moreover, as in tangent distance methods, we provide a simple local approximation to such directions, thus obtaining an efficient computational scheme. We also show that it is possible to automatically design optimal weak classifiers by using gradient descent. To increase efficiency at run time, such optimal weak classifiers are projected on a Haar basis. This results in designing strong classifiers that are more computationally efficient than in the case of exhaust...