

Tagging a Hebrew Corpus: the Case of Participles

14 years 4 months ago
Tagging a Hebrew Corpus: the Case of Participles
We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.
Meni Adler, Yael Dahan Netzer, Yoav Goldberg, Davi
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Meni Adler, Yael Dahan Netzer, Yoav Goldberg, David Gabay, Michael Elhadad
Comments (0)