Abstract. We show several PAC-style concentration bounds for learning unigrams language model. One interesting quantity is the probability of all words appearing exactly k times in a sample of size m. A standard estimator for this quantity is the Good-Turing estimator. The existing analysis on its error shows a PAC bound of approximately O k√ m . We improve its dependency on k to O 4√ k√ m + k m . We also analyze the empirical frequencies estimator, showing that its PAC error bound is approximately O 1 k + √ k m . We derive a combined estimator, which has an error of approximately O m− 2 5 , for any k. A standard measure for the quality of a learning algorithm is its expected per-word log-loss. We show that the leave-one-out method can be used for estimating the log-loss of the unigrams model with a PAC error of approximately O 1√ m , for any distribution. We also bound the log-loss a priori, as a function of various parameters of the distribution.