Most of the known stochastic sentence generators use syntactically annotated corpora, performing the projection to the surface in one stage. However, in full-fledged text generati...
Bernd Bohnet, Leo Wanner, Simon Mille, Alicia Burg...
The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data ...
We describe the semi-automatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet. In order to va...
This paper describes an interdisciplinary approach which brings together the fields of corpus linguistics and translation studies. It presents ongoing work on the creation of a co...
Background: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationship...
Sampo Pyysalo, Filip Ginter, Juho Heimonen, Jari B...
This paper presents AnCora, a multilingual corpus annotated at different linguistic levels consisting of 500,000 words in Catalan (AnCora-Ca) and in Spanish (AnCora-Es). At presen...
This paper discusses the problem of utilising multiply annotated data in training biomedical information extraction systems. Two corpora, annotated with entities and relations, an...
This paper presents the multimodal corpora that are being collected and annotated in the Nordic NOMCO project. The corpora will be used to study communicative phenomena such as fe...
Patrizia Paggio, Jens Allwood, Elisabeth Ahlsen, K...
This paper describes the process of building a newspaper corpus annotated with events described in specific documents. The main difference to the corpora built as part of the TDT ...
This paper presents the LSLink (or Life Science Link) methodology that provides users with a set of tools to explore the rich Web of interconnected and annotated objects in multipl...