A Universal Part-of-Speech Tagset

15 years 1 months ago

Download www.petrovi.de

To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-ofspeech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common partsof-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags.

Slav Petrov, Dipanjan Das, Ryan McDonald

Real-time Traffic

CORR 2011 | Education | Original Treebank Data | Universal Part-ofspeech Categories | Unsupervised Grammar Induction |

claim paper

» A Common PartsofSpeech Tagset Framework for Indian Languages

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2011
Where	CORR
Authors	Slav Petrov, Dipanjan Das, Ryan McDonald

Comments (0)

Sciweavers

A Universal Part-of-Speech Tagset

CORR 2011 | Education | Original Treebank Data | Universal Part-ofspeech Categories | Unsupervised Grammar Induction |

Explore & Download

Productivity Tools

Sciweavers