Being the client's first interface, call centres worldwide contain a huge amount of information of all kind under the form of conversational speech. If accessible, this infor...
Martine Garnier-Rizet, Gilles Adda, Frederik Caill...
Named Entities (NE) are regarded as an important type of semantic knowledge in many natural language processing (NLP) applications. Originally, a limited number of NE categories w...
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has...
We evaluate the extent to which the distinction between semantically core and non-core dependents as used in the FrameNet corpus corresponds to the traditional distinction between...
The PIT corpus is a German multi-media corpus of multi-party dialogues recorded in a Wizard-of-Oz environment at the University of Ulm. The scenario involves two human dialogue pa...
As a first step to developing systems that enable non-native speakers to output near-perfect English sentences for given mixed EnglishJapanese sentences, we propose new approaches...
This paper presents ongoing work dedicated to parsing the textual structure of procedural texts. We propose here a model for the intructional structure and criteria to identify it...
About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places aro...
Paul Trilsbeek, Daan Broeder, Tobias Valkenhoef, P...
We present in this article the methods we used for obtaining measures to ensure the quality and well-formedness of a text corpus. These measures allow us to determine the compatib...