On the Convergence Rate of Good-Turing Estimators

14 years 7 months ago

Download people.csail.mit.edu

Good-Turing adjustments of word frequencies are an important tool in natural language modeling. In particular, for any sample of words, there is a set of words not occuring in that sample. The total probability mass of the words not in the sample is the so-called missing mass. Good showed that the fraction of the sample consisting of words that occur only once in the sample is a nearly unbiased estimate of the missing mass. Here, we give a PACstyle high-probability conﬁdence interval for the actual missing mass. More generally, for , we give a conﬁdence interval for the true probability mass of the set of words occuring times in the sample.

David A. McAllester, Robert E. Schapire

Real-time Traffic

COLT 2000 | Machine Learning | Missing Mass | Probability Mass | So-called Missing Mass |

claim paper

Post Info
More Details (n/a)

Added	02 Aug 2010
Updated	02 Aug 2010
Type	Conference
Year	2000
Where	COLT
Authors	David A. McAllester, Robert E. Schapire

Comments (0)

Sciweavers

On the Convergence Rate of Good-Turing Estimators

COLT 2000 | Machine Learning | Missing Mass | Probability Mass | So-called Missing Mass |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers