When constructing a classifier, the probability of correct classification of future data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization. We also show how to exploit Mercer kernels in this setting to obtain nonlinear decision boundaries. A worst-case bound on the probability of misclassification of future data is obtained explicitly.
Gert R. G. Lanckriet, Laurent El Ghaoui, Chiranjib