We present an information theoretic approach for learning a linear dimension reduction transform for object classification. The theoretic guidance of the approach is that the transform should minimize the classification error, which, according to Fano’s optimal classification bound, amounts to maximizing the mutual information between the object class and the transformed feature. We propose a three-stage learning process. First, we use a support vector machine to select a subset of the training samples that are near the class boundaries. Second, we search this subset for the most informative samples to be used as the initial transform bases. Third, we use hill-climbing to refine these initial bases one at a time to maximize the mutual information between the transform coefficients and the object class distribution. We have applied the technique to face detection and we present encouraging results.