We describe a simple improvement to ngram language models where we estimate the distribution over closed-class (function) words separately from the conditional distribution of open-class words given function words. In English, function words account for about 30% of written language, and also form a natural skeleton for most sentences. By factoring a language model into a function word model and a conditional model over open-class words given function words, we largely avoid the problem of sparse training data in the first phase, and localize the need for sophisticated smoothing techniques primarily to the second conditional model. We test our factored approach on the Brown and Wall Street Journal corpora and observe a 3.5% to 25.2% improvement in perplexity over standard methods, depending on the particular smoothing method and test set used. Compared to other proposals for improving n-gram language models, our factorization has the advantage of inherent simplicity and efficiency, a...