In The Low Countries, a major reference corpus for written Dutch is currently being built. In this paper, we discuss the interplay between data acquisition and data processing dur...
The recent availability of large collections of text such as the Google 1T 5-gram corpus (Brants and Franz, 2006) and the Gigaword corpus of newswire (Graff, 2003) have made it po...
We present in this paper an on-going research: the construction and annotation of a Romanian Generative Lexicon (RoGL). Our system follows the specifications of CLIPS project for ...
This paper introduces a novel corpus of natural language dialogues obtained from humans performing a cooperative, remote, search task (CReST) as it occurs naturally in a variety o...
Kathleen M. Eberhard, Hannele Nicholson, Sandra K&...
An increasing demand for new language resources of recent EU members and accessing countries has in turn initiated the development of different language tools and resources, such ...
Sanja Seljan, Marko Tadic, Zeljko Agic, Jan Snajde...
This paper presents the Kachna Corpus of Spontaneous Speech, in which ten Czech and ten Norwegian speakers were recorded both in their native language and in English. The dialogue...
Language resources are typically defined and created for application in speech technology contexts, but the documentation of languages which are unlikely ever to be provided with ...
Annotation of digital recordings in humanities research still is, to a large extend, a process that is performed manually. This paper describes the first pattern recognition based...
Eric Auer, Albert Russel, Han Sloetjes, Peter Witt...
Electronic dictionaries offer many possibilities unavailable in paper dictionaries to view, display or access information. However, even these resources fall short when it comes t...
A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains d...