Often when modeling structured domains, it is desirable to leverage information that is not naturally expressed as simply a label. Examples include knowledge about the evaluation measure that will be used at test time, and partial (weak) label information. When the additional information has structure that factorizes according to small subsets of variables (i.e., is low order, or decomposable), several approaches can be used to incorporate it into a learning procedure. Our focus in this work is the more challenging case, where the additional information does not factorize according to low order graphical model structure; we call this the high order case. We propose to formalize various forms of this additional information as high order loss functions, which may have complex interactions over large subsets of variables. We then address the computational challenges inherent in learning according to such loss functions, particularly focusing on the loss-augmented inference problem that a...
Daniel Tarlow, Richard S. Zemel