Tagging a Hebrew Corpus: the Case of Participles

15 years 8 months ago

Download www.lrec-conf.org

We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.

Meni Adler, Yael Dahan Netzer, Yoav Goldberg, Davi

Real-time Traffic

Common Linguistic Knowledge | Education | LREC 2008 | Manual Tagging Accuracy | Modern Hebrew |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Meni Adler, Yael Dahan Netzer, Yoav Goldberg, David Gabay, Michael Elhadad

Comments (0)

Sciweavers

Tagging a Hebrew Corpus: the Case of Participles

Common Linguistic Knowledge | Education | LREC 2008 | Manual Tagging Accuracy | Modern Hebrew |

Explore & Download

Productivity Tools

Sciweavers