Probabilistic and Rule-Based Tagger of an Inflective Language- a Comparison

15 years 7 months ago

Download acl.ldc.upenn.edu

We present results of probabilistic tagging of Czech texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages. After description of the tag system used, we show the results of four experiments using a simple probabilistic model to tag Czech texts (unigram, two bigram experiments, and a trigram one). For comparison, we have applied the same code and settings to tag an English text (another four experiments) using the same size of training and test data in the experiments in order to avoid any doubt concerning the validity of the comparison. The experiments use the source channel model and maximum likelihood training on a Czech handtagged corpus and on tagged Wall Street Journal (WSJ) from the LDC collection. The experiments show (not surprisingly) that the more training data, the better is the success rate. The results also indicate that for inflective languages with 1000+ tags we have to develop a more sophisticated approach...

Jan Hajic, Barbora Hladká

Real-time Traffic

Ambiguous Inflective Languages | ANLP 1997 | Czech Texts | Inflective Languages |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	1997
Where	ANLP
Authors	Jan Hajic, Barbora Hladká

Comments (0)

Sciweavers

Probabilistic and Rule-Based Tagger of an Inflective Language- a Comparison

Ambiguous Inflective Languages | ANLP 1997 | Czech Texts | Inflective Languages |

Explore & Download

Productivity Tools

Sciweavers