The Teko corpus composing model offers a decentralized, dynamic way of collecting high-quality text corpora for linguistic research. The resulting corpus consists of independent t...
Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resourc...
Juri Ganitkevitch, Chris Callison-Burch, Courtney ...
Parallel corpora are a valuable resource for tasks such as cross-language information retrieval and data-driven natural language processing systems. Previously only small scale cor...
Large corpora are essential to modern methods of computational linguistics and natural language processing. In this paper, we describe an ongoing project whose aim is to build a l...