This paper presents a new corpus project, aiming at building a national corpus of Polish. What makes it different from a typical YACP (Yet Another Corpus Project) is 1) the fact t...
Large-scale grammar-based parsing systems nowadays increasingly rely on independently developed, more specialized components for pre-processing their input. However, different too...
Peter Adolphs, Stephan Oepen, Ulrich Callmeier, Be...
We present the procedures we implemented to carry out system oriented evaluation of a syntax-based word aligner --ALIBI. We take the approach of regarding cross-corpus evaluation ...
The majority of work described in this paper was conducted as part of the Recovering Evidence from Video by fusing Video Evidence Thesaurus and Video MetaData (REVEAL) project, sp...
Speaker identification and verification systems have a poor performance when model training is done in one language while the testing is done in another. This situation is not unu...
Patients require access to Electronic Patient Records, however medical language is often too difficult for patients to understand. Explaining records to patients is a time consumi...
The Arabic Treebank (ATB), released by the Linguistic Data Consortium, contains multiple annotation files for each source file, due in part to the role of diacritic inclusion in t...
In this paper, we define the task of Number Identification in natural context. We present and validate a language-independent semiautomatic approach to quickly building a gold sta...
Despite of the importance of lexical resources for a number of NLP applications (Machine Translation, Information Extraction, Event Detection and Tracking, Question Answering, amo...
We present work on a three-stage system to detect and classify disfluencies in multi party dialogues. The system consists of a regular expression based module and two machine lear...