Sum-product networks are a new deep architecture that can perform fast, exact inference
on high-treewidth models. Only generative methods for training SPNs
have been proposed to date. In this paper, we present the first discriminative
training algorithms for SPNs, combining the high accuracy of the former with
the representational power and tractability of the latter. We show that the class
of tractable discriminative SPNs is broader than the class of tractable generative
ones, and propose an efficient backpropagation-style algorithm for computing the
gradient of the conditional log likelihood. Standard gradient descent suffers from
the diffusion problem, but networks with many layers can be learned reliably using
“hard” gradient descent, where marginal inference is replaced by MPE inference
(i.e., inferring the most probable state of the non-evidence variables). The
resulting updates have a simple and intuitive form. We test discriminative SPNs
on standard image classi...