Invariant features or operators are often used to shield the recognition process from the effect of "nuisance" parameters, such as rotations, foreshortening, or illumination changes. From an information-theoretic point of view, imposing invariance results in reduced (rather than improved) system performance. In fact, in the case of small training samples, the situation is reversed, and invariant operators may reduce the misclassification rate. We propose an analysis of this interesting behavior based on the bias-variance dilemma, and present experimental results confirming our theoretical expectations. In addition, we introduce the concept of "randomized invariants" for training, which can be used to mitigate the effect of small sample size.