While mutual information-based methods have become popular for image registration, the question of what underlying feature to use is rarely discussed. Instead, it is implicitly assumed that intensity is the right feature to be matched. We depart from this tradition by first beginning with a set of feature images--the original intensity image and three directional derivative feature images. This "feature extraction" is performed on both images in a typical intermodality registration setup. Assuming the existence of a training set of registered images, we find the best projection onto a single feature image by maximizing the normalized mutual information (NMI) between the two images w.r.t. the projection weights. After discovering the best feature to match using normalized mutual information as the criterion, we use the same projection coefficients on new test images. We show that affine NMI-based registration of the test images using the new best "feature" is more n...