Kernel methods provide an efficient mechanism to derive nonlinear algorithms. In classification problems as well as in feature extraction, kernel-based approaches map the originally nonlinearly separable data into a space of intrinsically much higher dimensionality where the data is linearly separable and can be readily classified with existing and efficient linear methods. For a given kernel function, the main challenge is to determine the parameters of the kernel which map the original nonlinear problem to a linear one. This paper derives a Bayes optimal criterion for the selection of the kernel parameters in discriminant analysis. Our criterion selects the kernel parameters that maximize the (Bayes) classification accuracy in the kernel space. We also show how we can use the same criterion to do subclass selection in the kernel space for problems with multimodal class distributions. Extensive experimental evaluation demonstrates the superiority of the proposed criterion over the st...