The Nijmegen Corpus of Casual Spanish

15 years 8 months ago

Download www.lrec-conf.org

This article describes the preparation, recording and orthographic transcription of a new speech corpus, the Nijmegen Corpus of Casual Spanish (NCCSp). The corpus contains around 30 hours of recordings of 52 Madrid Spanish speakers engaged in conversations with friends. The orthographic transcription contains around 393 000 word tokens and 16 500 word types. Casual speech was elicited following a procedure similar to that used for the creation of the Nijmegen Corpus of Casual French (Torreira et al., 2010). The recordings consisted of three different parts, which together provided around ninety minutes of speech from every group of speakers. While Parts 1 and 2 did not require participants to perform any specific task, in Part 3 participants negotiated a common answer to general questions about society. The resulting corpus is a rich resource of highly casual speech that can be effectively exploited by researchers in language science and technology. Information about how to obtain a c...

Francisco Torreira, Mirjam Ernestus

Real-time Traffic

Casual Speech | Education | LREC 2010 | Nijmegen Corpus | Orthographic Transcription |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Francisco Torreira, Mirjam Ernestus

Sciweavers

The Nijmegen Corpus of Casual Spanish

Casual Speech | Education | LREC 2010 | Nijmegen Corpus | Orthographic Transcription |

Explore & Download

Productivity Tools

Sciweavers