We present the first PAC bounds for learning parameters of Conditional Random Fields [12] with general structures over discrete and real-valued variables. Our bounds apply to composite likelihood [14], which generalizes maximum likelihood and pseudolikelihood [3]. Moreover, we show that the only existing algorithm with a PAC bound for learning high-treewidth discrete models [1] can be viewed as a computationally inefficient method for computing pseudolikelihood. We present an extensive empirical study of the statistical efficiency of these estimators, as predicted by our bounds. Finally, we use our bounds to show how to construct computationally and statistically efficient composite likelihood estimators.
Joseph K. Bradley, Carlos Guestrin