Language resources are typically defined and created for application in speech technology contexts, but the documentation of languages which are unlikely ever to be provided with ...
Annotation of digital recordings in humanities research still is, to a large extend, a process that is performed manually. This paper describes the first pattern recognition based...
Eric Auer, Albert Russel, Han Sloetjes, Peter Witt...
Electronic dictionaries offer many possibilities unavailable in paper dictionaries to view, display or access information. However, even these resources fall short when it comes t...
A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains d...
This paper presents the procedure of the syntactic annotation of the National Corpus of Polish. Syntactic annotation consists here of shallow parsing and manual post-editing of th...
India is a multilingual country where machine translation and cross lingual search are highly relevant problems. These problems require large resources- like wordnets and lexicons...
Manual text annotation is a resource-consuming endeavor necessary for NLP systems when they target new tasks or domains for which there are no existing annotated corpora. Distribu...
Emilia Apostolova, Sean Neilan, Gary An, Noriko To...
We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These were developed during our ongo...
Alistair Willis, David King, David Morse, Anton Di...
Conventional methods for disambiguation problems have been using statistical methods with co-occurrence of words in their contexts. It seems that human-beings assign appropriate w...
We report about tools for the extraction of German multiword expressions (MWEs) from text corpora; we extract word pairs, but also longer MWEs of different patterns, e.g. verb-nou...