Basic language-inherent tempo cannot be isolated by the current metrics of speech rhythm. Here we propose the number of syllables per intonation unit as an appropriate measure, al...
The Lombard effect refers to the speech changes due to the immersion of the speaker in a noisy environment. Among these changes, studies have already reported acoustic modificatio...
This paper presents an improved wavelet-based dereverberation method for automatic speech recognition (ASR). Dereverberation is based on filtering reverberant wavelet coefficients...
Log-linear models have recently been used in acoustic modeling for speech recognition systems. This has been motivated by competitive results compared to systems based on Gaussian...
Dynamic modeling of spoken dialogue seeks to capture how interlocutors change their speech over the course of a conversation. Much work has focused on how speakers adapt or entrai...
In an attempt to improve models of human perception, the recognition of phonemes in nonsense utterances was predicted with automatic speech recognition (ASR) in order to analyze i...
Speech can be divided into discourse genres based on the contextual environment it occurs in (e.g. political speech, sport commentary speech, etc.). The present study investigated...
Nicolas Obin, Volker Dellwo, Anne Lacheret, Xavier...
Neural network language models (NNLM) have become an increasingly popular choice for large vocabulary continuous speech recognition (LVCSR) tasks, due to their inherent generalisa...
Junho Park, Xunying Liu, Mark J. F. Gales, Philip ...
We present a system for quickly and cheaply building transcribed speech corpora containing utterances from many speakers in a variety of acoustic conditions. The system consists o...
Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu...
In this paper, we demonstrate the trichotomic realization of voiced velars in Japanese, challenging the traditional plosive/nasal dichotomy of velar allophones, and examine the di...