In this paper we present contrastive colour studies done using COMPARA, the largest edited parallel corpus in the world (as far as we know). The studies were the result of semanti...
This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the same parallel corpus. We first describe our method f...
We present our efforts to create a large-scale, semi-automatically annotated parallel corpus of cleft constructions. The corpus is intended to reduce or make more effective the ma...
We describe a syntactically annotated parallel corpus containing English, Swedish and Turkish. The corpus consists of approximately 300 000 tokens in Swedish, 160 000 in Turkish a...
After three years of work the Dutch Parallel Corpus (DPC) project has reached an end. The finalized corpus is a ten-million-word high-quality sentence-aligned bidirectional parall...
Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains...
Abstract. This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations accordin...
We present a report on our participation in the English-Dutch bilingual task of the 2001 Cross-Language Evaluation Forum (CLEF). We attempted to demonstrate that good cross languag...
IR with reference corpus is one approach when dealing with relevant sentences detection, which takes the result of IR as the representation of query (sentence). Lack of informatio...
Abstract. The “Semantic Mirrors Method” (Dyvik, 1998) is a means for automatic derivation of thesaurus entries from a word-aligned parallel corpus. The method is based on the c...