This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of t...
Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, ...
We focus in this paper on the named entity recognition task in spoken data. The proposed approach investigates the use of various contexts of the words to improve recognition. Exp...
In this paper we describe and analyze a data pruning method in combination with template-based automatic speech recognition. We demonstrate the positive effects of polishing the t...
In this paper, we present efficient HMM-based techniques for estimating missing features. By assuming speech features to be observations of hidden Markov processes, we derive a mi...
The context in which a speech-driven application is used (or conversely not used) can be an important signal for recognition engines, and for spoken interface design. Using large-...
In this paper, we analyze whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the sp...
The perceived quality of a synthetic visual speech signal greatly depends on the smoothness of the presented visual articulators. This paper explains how concatenative visual spee...
Continuous speech input for ASR processing is usually presegmented into speech stretches by pauses. In this paper, we propose that smaller, prosodically defined units can be ident...
Yi-Fen Liu, Shu-Chuan Tseng, Jyh-Shing Roger Jang,...