We discuss existing approaches to train LR parsers, which have been used for statistical resolution of structural ambiguity. These approaches are nonoptimal, in the sense that a collection of probability distributions cannot be obtained. In particular, some probability distributions expressible in terms of a context-free grammar cannot be expressed in terms of the LR parser constructed from that grammar, under the restrictions of the existing approaches to training of LR parsers. We present an alternative way of training that is provably optimal, and that allows all probability distributions expressible in the context-free grammar to be carried over to the LR parser. We also demonstrate empirically that this kind of training can be effectively applied on a large treebank.