Sciweavers

LREC
2010

The Nijmegen Corpus of Casual Spanish

14 years 2 months ago
The Nijmegen Corpus of Casual Spanish
This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual Spanish (NCCSp). The corpus contains around 30 hours of recordings of 52 Madrid Spanish speakers engaged in conversations with friends. The orthographic transcription contains around 393 000 word tokens and 16 500 word types. Casual speech was elicited following a procedure similar to that used for the creation of the Nijmegen Corpus of Casual French (Torreira et al., 2010). The recordings consisted of three different parts, which together provided around ninety minutes of speech from every group of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. The resulting corpus is a rich resource of highly casual speech that can be effectively exploited by researchers in language science and technology. Information about how to obtain a c...
Francisco Torreira, Mirjam Ernestus
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Francisco Torreira, Mirjam Ernestus
Comments (0)