In this paper we present a new adaptive short-time Fourier analysissynthesis scheme and demonstrate its efficacy in speech enhancement. While a number of adaptive analyses have p...
Daniel Rudoy, Prabahan Basu, Thomas F. Quatieri, B...
This paper looks at a parsing-based alternative to word error rate (WER) for optimizing recognition, SParseval, hypothesizing that it may be a better objective for applications su...
Dustin Hillard, Mei-Yuh Hwang, Mary P. Harper, Mar...
This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our wo...
Audio segmentation has received increasing attention in recent years for its potential applications in automatic indexing and transcription of audio data. Among existing audio seg...
In this paper, we cast discriminative training problems into standard linear programming (LP) optimization. Besides being convex and having globally optimal solution(s), LP progra...
ITU-T has selected the candidate submitted by Ericsson, Nokia, Motorola, VoiceAge, and Texas Instruments as the baseline for the G.EV-VBR coding standard. G.EV-VBR is an embedded ...
In recent research, we have proposed a high-accuracy bottom-up detection-based paradigm for continuous phone speech recognition. The key component of our system was a bank of arti...
Automatic speech recognition (ASR) systems have been developed only for a very limited number of the estimated 7,000 languages in the world. In order to avoid the evolvement of a ...
Motivated by linguistic theories of prosodic categoricity, symbolic representations of prosody have recently attracted the attention of speech technologists. Categorical represent...