A fully Bayesian approach to unsupervised part-of-speech tagging

15 years 3 months ago

Download cocosci.berkeley.edu

Unsupervised learning of linguistic structure is a difﬁcult problem. A common approach is to deﬁne a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We ﬁnd improvements both when training from data alone, and using...

Sharon Goldwater, Tom Griffiths

Real-time Traffic