In this paper we share our experience and describe the methodologies that we have used in designing and recording large speech databases for applications requiring speech synthesi...
We propose a new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features. For distributed speech ...
Xing Fan, Michael L. Seltzer, Jasha Droppo, Henriq...
We propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker toward automatic realistic synthesis of head gestures from speech p...
Recently, an increasing attention has been paid to Mandarin word stress which is important for improving the naturalness of speech synthesis. Most of the research on Mandarin spee...
Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoy...
We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she u...