In this paper, we adress the question of decoding cognitive information from functional Magnetic Resonance (MR) images using classification techniques. The main bottleneck for accurate prediction is the selection of informative features (voxels). We develop a multivariate approach based on a mutual information criterion, estimated by nearest neighbors. This method can handle a large number of dimensions and is able to detect the non-linear correlations between the features and the label. We show that, by using MI-based feature selection, we can achieve better perfomance together with sparse feature selection, and thus a better understanding of information coding within the brain than the reference method which is a mass univariate selection (ANOVA).