Sciweavers

NLE
2008

Part-of-speech tagging of Modern Hebrew text

14 years 13 days ago
Part-of-speech tagging of Modern Hebrew text
Words in Semitic texts often consist of a concatenation of word segments, each corresponding to a Part-of-Speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When designing POS taggers for Semitic languages, a major architectural decision concerns the choice of the atomic input tokens (terminal symbols). If the tokenization is at the word level the output tags must be complex, and represent both the segmentation of the word and the POS tag assigned to each word segment. If the tokenization is at the segment level, the input itself must encode the different alternative segmentations of the words, while the output consists of standard POS tags. Comparing these two alternatives is not trivial, as the choice between them may have global effects on the grammatical model. Moreover, intermediate levels of tokenization between these two extremes are conceivable, and, as we will aim to show, beneficial. To...
Roy Bar-Haim, Khalil Sima'an, Yoad Winter
Added 14 Dec 2010
Updated 14 Dec 2010
Type Journal
Year 2008
Where NLE
Authors Roy Bar-Haim, Khalil Sima'an, Yoad Winter
Comments (0)