We present a novel discriminative training algorithm for n-gram language models for use in large vocabulary continuous speech recognition. The algorithm uses large margin estimation (LME) to build an objective function for maximizing the minimum margin between correct transcriptions and their competing hypotheses, which are encoded as word graphs generated from the Viterbi decoding process. The nonlinear LME objective function is approximated by a linear EM-style auxiliary function that leads to a linear programing problem, which is efficiently solved by convex optimization algorithms. Experimental results have shown that the proposed discriminative training method can outperform the conventional discounting-based maximum likelihood estimation methods. A relative reduction in word error rate of over 2.5% has been observed on the SPINE1 speech recognition task.