Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

13 years 5 days ago

Download www.stanford.edu

We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we ﬁrst demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech signiﬁcantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-ofthe-art dependency grammar inducer achieves 59.1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% highe...

Valentin I. Spitkovsky, Hiyan Alshawi, Angel X. Ch

Real-time Traffic

EMNLP 2011 | Grammar Induction | Human Annotation | Natural Language Processing | Wall Street Journal Wsj |

claim paper

Post Info
More Details (n/a)

Added	20 Dec 2011
Updated	20 Dec 2011
Type	Journal
Year	2011
Where	EMNLP
Authors	Valentin I. Spitkovsky, Hiyan Alshawi, Angel X. Chang, Daniel Jurafsky

Comments (0)

Sciweavers

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

EMNLP 2011 | Grammar Induction | Human Annotation | Natural Language Processing | Wall Street Journal Wsj |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers