We describe techniques for the automatic detection of relationships among domain entities (e.g. genes, proteins, diseases) mentioned in the biomedical literature. Our approach is ...
Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand,...
We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary appli...
Georg Rehm, Marina Santini, Alexander Mehler, Pave...
Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological tagging and lemmatization from morphologically annotated corpora. The system i...
Grzegorz Chrupala, Georgiana Dinu, Josef van Genab...
We introduce the corpus of United States Congressional bills from 1947 to 1998 for use by language research communities. The U.S. Policy Agenda Legislation Corpus Volume 1 (USPALC...
In this paper, we describe a unifying approach to tackle data heterogeneity issues for lexica and related resources. We present LEXUS, our software that implements the Lexical Mar...
Marc Kemps-Snijders, Claus Zinn, Jacquelijn Ringer...
The "download first, then process paradigm" is still the predominant working method amongst the research community. The web-based paradigm, however, offers many advantag...
Marc Kemps-Snijders, Alexander Klassmann, Claus Zi...
This paper describes a newly created text corpus of news articles that has been annotated for cross-document co-reference. Being able to robustly resolve references to entities ac...
David Day, Janet Hitzeman, Michael L. Wick, Keith ...
As huge quantities of documents have become available, services using natural language processing technologies trained by huge corpora have emerged, such as information retrieval ...
This paper discusses the use of computational linguistic technology to extract definitions from a large corpus of German court decisions. We present a corpus-based survey of defin...
Air traffic control (ATC) is based on voice communication between pilots and controllers and uses a highly task and domain specific language. Due to this very reason, spoken langu...