Morphosyntactic Tagging of Slovene Using Progol

15 years 11 months ago

Download www-ai.ijs.si

Abstract. We consider the task of tagging Slovene words with morphosyntactic descriptions (MSDs). MSDs contain not only part-of-speech information but also attributes such as gender and case. In the case of Slovene there are 2,083 possible MSDs. P-Progol was used to learn morphosyntactic disambiguation rules from annotated data (consisting of 161,314 examples) produced by the MULTEXT-East project. P-Progol produced 1,148 rules taking 36 hours. Using simple grammatical background knowledge, e.g. looking for case disagreement, P-Progol induced 4,094 clauses in eight parallel runs. These rules have proved eﬀective at detecting and explaining incorrect MSD annotations in an independent test set, but have not so far produced a tagger comparable to other existing taggers in terms of accuracy.

James Cussens, Saso Dzeroski, Tomaz Erjavec

Real-time Traffic

Automated Reasoning | ILP 1999 | Morphosyntactic Descriptions | Morphosyntactic Disambiguation Rules | Possible Msds |

claim paper

Added	04 Aug 2010
Updated	04 Aug 2010
Type	Conference
Year	1999
Where	ILP
Authors	James Cussens, Saso Dzeroski, Tomaz Erjavec

Sciweavers

Morphosyntactic Tagging of Slovene Using Progol

Automated Reasoning | ILP 1999 | Morphosyntactic Descriptions | Morphosyntactic Disambiguation Rules | Possible Msds |

Explore & Download

Productivity Tools

Sciweavers