Experiments in Cross-Language Morphological Annotation Transfer

15 years 10 months ago

Download www.ling.ohio-state.edu

Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is available. Our paper describes experiments with Polish, Czech, and Russian. However, the method is not tied in any way to these languages. In all the experiments we use the TnT tagger ([3]), a second-order Markov model. Our approach assumes that the information acquired about one language can be used for processing a related language. We have found out that even breathtakingly naive things (such as approximating the Russian transitions by Czech and/or Polish and approximating the Russian emissions by (manually/automatically derived) Czech cognates) can lead to a significant improvement of the ta...

Anna Feldman, Jirka Hana, Chris Brew

Real-time Traffic

CICLING 2006 | Natural Language Processing | Source Language | Target Language | Unannotated Text Corpus |

claim paper

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	CICLING
Authors	Anna Feldman, Jirka Hana, Chris Brew

Sciweavers

Experiments in Cross-Language Morphological Annotation Transfer

CICLING 2006 | Natural Language Processing | Source Language | Target Language | Unannotated Text Corpus |

Explore & Download

Productivity Tools

Sciweavers