The English-Swedish-Turkish Parallel Treebank

15 years 8 months ago

Download www.lingfil.uu.se

We describe a syntactically annotated parallel corpus containing English, Swedish and Turkish. The corpus consists of approximately 300 000 tokens in Swedish, 160 000 in Turkish and 150 000 in English, both fiction and technical documents. We build the corpus by using the Uplug toolkit for automatic structural markup and sentence and word alignment, and basic language resource kits for the linguistic analysis of the languages involved. The annotation is carried on various layers from morphological and part of speech analysis to dependency structures. The treebank is used in teaching and linguistic research to study the relationship between the structurally different languages.

Beáta Megyesi, Bengt Dahlqvist, Éva

Real-time Traffic

Automatic Structural Markup | Education | Language Resource Kits | LREC 2010 | Parallel Corpus |

claim paper

» Cross Language Dependency Parsing using a Bilingual Lexicon

» Automatic Generation of Parallel Treebanks

» Building a Parallel Bilingual Syntactically Annotated Corpus

» SwedishTurkish Parallel Treebank

» Annotating PredicateArgument Structure for a Parallel Treebank

» Inducing Sentence Structure from Parallel Corpora for Reordering

» Building a Bilingual ValLex Using Treebank Token Alignment First Observations

» Urdu and Hindi Translation and sharing of linguistic resources

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Beáta Megyesi, Bengt Dahlqvist, Éva Á. Csató, Joakim Nivre

Comments (0)

Sciweavers

The English-Swedish-Turkish Parallel Treebank

Automatic Structural Markup | Education | Language Resource Kits | LREC 2010 | Parallel Corpus |

Explore & Download

Productivity Tools

Sciweavers