Sciweavers

LREC
2010

The English-Swedish-Turkish Parallel Treebank

14 years 28 days ago
The English-Swedish-Turkish Parallel Treebank
We describe a syntactically annotated parallel corpus containing English, Swedish and Turkish. The corpus consists of approximately 300 000 tokens in Swedish, 160 000 in Turkish and 150 000 in English, both fiction and technical documents. We build the corpus by using the Uplug toolkit for automatic structural markup and sentence and word alignment, and basic language resource kits for the linguistic analysis of the languages involved. The annotation is carried on various layers from morphological and part of speech analysis to dependency structures. The treebank is used in teaching and linguistic research to study the relationship between the structurally different languages.
Beáta Megyesi, Bengt Dahlqvist, Éva
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Beáta Megyesi, Bengt Dahlqvist, Éva Á. Csató, Joakim Nivre
Comments (0)