Abstract. We consider a problem that is related to the “Universal Encoding Problem” from information theory. The basic goal is to find rules that map “partial information” about a distribution X over an m-letter alphabet into a guess X for X such that the Kullback-Leibler divergence between X and X is as small as possible. The cost associated with a rule is the maximal expected Kullback-Leibler divergence between X and X. First, we show that the cost associated with the well-known add-one rule equals ln(1 + (m − 1)/(n + 1)) thereby extending a result of Forster and Warmuth [3, 2] to m ≥ 3. Second, we derive an absolute (as opposed to asymptotic) lower bound on the smallest possible cost. Technically, this is done by determining (almost exactly) the Bayes error of the add-one rule with a uniform prior (where the asymptotics for n → ∞ was known before). Third, we hint to tools from approximation theory and support the conjecture that there exists a rule whose cost asympt...