This paper describes past, ongoing and planned work on the collection and transcription of spoken language samples for all the South African official languages and as part of this...
Large corpora are essential to modern methods of computational linguistics and natural language processing. In this paper, we describe an ongoing project whose aim is to build a l...
Statistical machine translation to morphologically richer languages is a challenging task and more so if the source and target languages differ in word order. Current state-of-the...
The creation of language resources for less-resourced languages like the historical ones benefits from the exploitation of language-independent tools and methods developed over th...
In this paper, we base on the syntactic structural Chinese Treebank corpus, construct the Chinese Opinon Treebank for the research of opinion analysis. We introduce the tagging sc...