This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set o...
This work introduces a robot driven camera controlled by speech. The SIMIS database of 20 recordings of real life surgical operations serves as basis for analyses and noise modell...
We propose a novel multi-stream framework for continuous conversational speech recognition which employs bidirectional Long Short-Term Memory (BLSTM) networks for phoneme predicti...
High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exe...
Antti Hurmalainen, Jort F. Gemmeke, Tuomas Virtane...
With the rapidly growing use of the audio and multimedia information over the Internet, the technology for retrieving speech information using voice queries is becoming more and mo...