

A Universal Part-of-Speech Tagset

13 years 7 months ago
A Universal Part-of-Speech Tagset
To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-ofspeech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common partsof-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags.
Slav Petrov, Dipanjan Das, Ryan McDonald
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2011
Where CORR
Authors Slav Petrov, Dipanjan Das, Ryan McDonald
Comments (0)