The paper presents Bulgarian National Corpus project (BulNC) - a large-scale, representative, online available corpus of Bulgarian. The BulNC is also a monolingual general corpus,...
The Semantic Web is facing the important challenge to maintain its promise of a real world-wide graph of interconnected resources. Unfortunately, while URIs almost guarantee a dir...
The goal of this paper is to investigate French word segmentation strategies using phonemic and lexical transcriptions as well as prosodic and part-of-speech annotations. Average ...
We describe two corpora of question and answer pairs collected for complex, open-domain Question Answering (QA) to enable answer classification and re-ranking experiments. We deli...
We present CAVaT, a tool that performs Corpus Analysis and Validation for TimeML. CAVaT is an open source, modular checking utility for statistical analysis of features specific t...
The aim of the paper is to present recent -- as of March 2010 -- developments in the construction of the National Corpus of Polish (NKJP). The NKJP project was launched at the ver...
We present the Spontal database of spontaneous Swedish dialogues. 120 dialogues of at least 30 minutes each have been captured in high-quality audio, high-resolution video and wit...
Jens Edlund, Jonas Beskow, Kjell Elenius, Kahl Hel...
This paper describes the design and collection of NameDat, a database containing English proper names spoken by native Norwegians. The database was designed to cover the typical a...
In this paper, we present work on enhancing the basic data resource of a context-aware system. First, we introduce a supervised approach to extracting geographical relations on a ...