We consider support vector machines for binary classification. As opposed to most approaches we use the number of support vectors (the "L0 norm") as a regularizing term instead of the L1 or L2 norms. In order to solve the optimization problem we use the cross entropy method to search over the possible sets of support vectors. The algorithm consists of solving a sequence of efficient linear programs. We report experiments where our method produces generalization errors that are similar to support vector machines, while using a considerably smaller number of support vectors.
Shie Mannor, Dori Peleg, Reuven Y. Rubinstein