Max-margin Markov networks (M3 N) have shown great promise in structured prediction and relational learning. Due to the KKT conditions, the M3 N enjoys dual sparsity. However, the existing M3 N formulation does not enjoy primal sparsity, which is a desirable property for selecting significant features and reducing the risk of over-fitting. In this paper, we present an 1-norm regularized max-margin Markov network ( 1-M3 N), which enjoys dual and primal sparsity simultaneously. To learn an 1-M3 N, we present three methods including projected sub-gradient, cutting-plane, and a novel EM-style algorithm, which is based on an equivalence between 1-M3 N and an adaptive M3 N. We perform extensive empirical studies on both synthetic and real data sets. Our experimental results show that: (1) 1-M3 N can effectively select significant features; (2) 1-M3 N can perform as well as the pseudo-primal sparse Laplace M3 N in prediction accuracy, while consistently outperforms other competing method...
Jun Zhu, Eric P. Xing, Bo Zhang