Abstract— Especially for tasks like automatic meeting transcription, it would be useful to automatically recognize speech also while multiple speakers are talking simultaneously....
Dorothea Kolossa, Shoko Araki, Marc Delcroix, Tomo...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investigated problem in pa...
We investigate various ways of generating prosodic syllable contour features that have recently been applied to enhance systems for speaker recognition. We compare different appro...
Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision...
In this paper, we propose a new approach for extracting and representing prosodic features directly from the speech signal. We hypothesize that prosody is linked to linguistic uni...