Swedish-Turkish Parallel Treebank

15 years 8 months ago

Download www.lrec-conf.org

In this paper, we describe our work on building a parallel treebank for a less studied and typologically dissimilar language pair, namely Swedish and Turkish. The treebank is a balanced syntactically annotated corpus containing both fiction and technical documents. In total, it consists of approximately 160,000 tokens in Swedish and 145,000 in Turkish. The texts are linguistically annotated using different layers from part of speech tags and morphological features to dependency annotation. Each layer is automatically processed by using basic language resources for the involved languages. The sentences and words are aligned, and partly manually corrected. We create the treebank by reusing and adjusting existing tools for the automatic annotation, alignment, and their correction and visualization. The treebank was developed within the project Supporting research environment for minor languages aiming at to create representative language resources for language pairs dissimilar in languag...

Beáta Megyesi, Bengt Dahlqvist, Eva Petters

Real-time Traffic

Dissimilar Language Pair | Education | Language Pairs | LREC 2008 | Treebank |

claim paper

» Exploiting Parallel Treebanks to Improve PhraseBased Statistical Machine Translation

» Cross Language Dependency Parsing using a Bilingual Lexicon

» Automatic Generation of Parallel Treebanks

» Building a Parallel Bilingual Syntactically Annotated Corpus

» Annotating PredicateArgument Structure for a Parallel Treebank

» Inducing Sentence Structure from Parallel Corpora for Reordering

» Building a Bilingual ValLex Using Treebank Token Alignment First Observations

» Urdu and Hindi Translation and sharing of linguistic resources

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Beáta Megyesi, Bengt Dahlqvist, Eva Pettersson, Joakim Nivre

Comments (0)

Sciweavers

Swedish-Turkish Parallel Treebank

Dissimilar Language Pair | Education | Language Pairs | LREC 2008 | Treebank |

Explore & Download

Productivity Tools

Sciweavers