Sciweavers

COLT
2000
Springer

On the Convergence Rate of Good-Turing Estimators

14 years 4 months ago
On the Convergence Rate of Good-Turing Estimators
Good-Turing adjustments of word frequencies are an important tool in natural language modeling. In particular, for any sample of words, there is a set of words not occuring in that sample. The total probability mass of the words not in the sample is the so-called missing mass. Good showed that the fraction of the sample consisting of words that occur only once in the sample is a nearly unbiased estimate of the missing mass. Here, we give a PACstyle high-probability confidence interval for the actual missing mass. More generally, for , we give a confidence interval for the true probability mass of the set of words occuring times in the sample.
David A. McAllester, Robert E. Schapire
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where COLT
Authors David A. McAllester, Robert E. Schapire
Comments (0)