We present an overview of the data collection and transcription efforts for the COnversational Speech In Noisy Environments (COSINE) corpus. The corpus is a set of multi-party conversations recorded in real world environments with background noise that can be used to train noise-robust speech recognition systems. We explain the motivation for creating such a corpus and describe the resulting audio recordings and transcriptions that comprise the corpus. These recordings include a 4-channel array and close-talking, far-field, and throat microphones on separate synchronized channels, allowing for unique algorithm research.
Alex Stupakov, Evan Hanusa, Jeff A. Bilmes, Dieter