In this paper, the Acoustic Event Detection (AED) system developed at the UPC is described, and its results in the CLEAR evaluations carried out in March 2007 are reported. The sys...
Commonly used coreference resolution evaluation metrics can only be applied to key mentions, i.e. already annotated mentions. We here propose two variants of the B3 and CEAF coref...
The GIVE Challenge is a recent shared task in which NLG systems are evaluated over the Internet. In this paper, we validate this novel NLG evaluation methodology by comparing the ...
Alexander Koller, Kristina Striegnitz, Donna Byron...
The evaluation and assessment of physicians-in-training (house staff) is a complex task. Residency training programs are under increasing pressure [1] to provide accurate and comp...
The evaluation of a large implemented natural language processing system involves more than its application to a common performance task. Such tasks have been used in the message u...