Feature selection is an important task in order to achieve better generalizability in high dimensional learning, and structure learning of Markov random fields (MRFs) can automatically discover the inherent structures underlying complex data. Both problems can be cast as solving an ℓ1-norm regularized parameter estimation problem. To solve such an ℓ1-regularized estimation problem, the existing Grafting [16] method can avoid doing inference on dense graphs in structure learning by incrementally selecting new features. However, Grafting performs a greedy step of optimizing over free parameters once new features are included. This greedy strategy results in low efficiency when parameter learning is itself non-trivial, such as in MRFs, in which parameter learning depends on an expensive subroutine to calculate gradients. The complexity of calculating gradients in MRFs is typically exponential to the size of maximal cliques. In this paper, we present a fast algorithm called GraftingL...
Jun Zhu, Ni Lao, Eric P. Xing