Sparsity is a desirable property in high dimensional learning. The 1-norm regularization can lead to primal sparsity, while max-margin methods achieve dual sparsity. Combining these two methods, an 1-norm max-margin Markov network ( 1-M3 N) can achieve both types of sparsity. This paper analyzes its connections to the Laplace maxmargin Markov network (LapM3 N), which inherits the dual sparsity of max-margin models but is pseudo-primal sparse, and to a novel adaptive M3 N (AdapM3 N). We show that the 1-M3 N is an extreme case of the LapM3 N, and the 1-M3 N is equivalent to an AdapM3 N. Based on this equivalence we develop a robust EM-style algorithm for learning an 1-M3 N. We demonstrate the advantages of the simultaneously (pseudo-) primal and dual sparse models over the ones which enjoy either primal or dual sparsity on both synthetic and real data sets.
Jun Zhu, Eric P. Xing