Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton & McCallum, 2005) can have better accuracy but does not scale as well in the variable cardinality. In this paper, we introduce piecewise pseudolikelihood, which retains the computational efficiency of pseudolikelihood but can have much better accuracy. On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikelihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training.
Charles A. Sutton, Andrew McCallum