In this paper we address the problem of estimating who is speaking from automatically extracted low resolution visual cues in group meetings. Traditionally, the task of speech/non...
The number of video clips available online is growing at a tremendous pace. Conventionally, user-supplied metadata text, such as the title of the video and a set of keywords, has ...
Mehmet Emre Sargin, Hrishikesh Aradhye, Pedro J. M...
The Degenerate Unmixing Estimation Technique (DUET) is a Blind Source Separation (BSS) algorithm for stereo audio. DUET depends on an amplitude-phase 2d histogram built from the d...
This paper describes recent advances at LIMSI in Mandarin Chinese speech-to-text transcription. A number of novel approaches were introduced in the different system components. Th...
Lori Lamel, Jean-Luc Gauvain, Viet-Bac Le, Ilya Op...
The method which is called the “tandem approach” in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a...