Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

15 years 4 months ago

Download comp.ling.utexas.edu

We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perform poorly in unsupervised and semisupervised POS tagging. This modification significantly improves unsupervised POS tagging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.

Taesun Moon, Katrin Erk, Jason Baldridge

Real-time Traffic

EMNLP 2010 | Function Word | Local Document Context | Natural Language Processing | Standard Hmm |

claim paper

Added	11 Feb 2011
Updated	11 Feb 2011
Type	Journal
Year	2010
Where	EMNLP
Authors	Taesun Moon, Katrin Erk, Jason Baldridge

Sciweavers

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

EMNLP 2010 | Function Word | Local Document Context | Natural Language Processing | Standard Hmm |

Explore & Download

Productivity Tools

Sciweavers