Designing and Evaluating a Russian Tagset

15 years 8 months ago

Download www.lrec-conf.org

This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 600 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set of tagging models and corpora that can be shared with other researchers.

Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Ann

Real-time Traffic

Associated Morphosyntactic Specifications | Education | LREC 2008 | Morphosyntactic | Russian Morphosyntactic Phenomena |

claim paper

» The Development of a Morphosyntactic Tagset for Afrikaans and its Use with Statistical Tag...

» LIRICS Semantic Role Annotation Design and Evaluation of a Set of Data Categories

» Automatic geotagging of Russian web sites

» Applying Conditional Random Fields to Japanese Morphological Analysis

» Using the Web for Language Independent Spellchecking and Autocorrection

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman, Dagmar Divjak

Comments (0)

Sciweavers

Designing and Evaluating a Russian Tagset

Associated Morphosyntactic Specifications | Education | LREC 2008 | Morphosyntactic | Russian Morphosyntactic Phenomena |

Explore & Download

Productivity Tools

Sciweavers