Sciweavers

ANLP
1997

Probabilistic and Rule-Based Tagger of an Inflective Language- a Comparison

14 years 25 days ago
Probabilistic and Rule-Based Tagger of an Inflective Language- a Comparison
We present results of probabilistic tagging of Czech texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages. After description of the tag system used, we show the results of four experiments using a simple probabilistic model to tag Czech texts (unigram, two bigram experiments, and a trigram one). For comparison, we have applied the same code and settings to tag an English text (another four experiments) using the same size of training and test data in the experiments in order to avoid any doubt concerning the validity of the comparison. The experiments use the source channel model and maximum likelihood training on a Czech handtagged corpus and on tagged Wall Street Journal (WSJ) from the LDC collection. The experiments show (not surprisingly) that the more training data, the better is the success rate. The results also indicate that for inflective languages with 1000+ tags we have to develop a more sophisticated approach...
Jan Hajic, Barbora Hladká
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1997
Where ANLP
Authors Jan Hajic, Barbora Hladká
Comments (0)