This paper considers the enhancement of noisy speech. Earlier studies have revealed that an approach that enhances spectral envelopes by using prior knowledge about the all-pole (AP) model parameters of clean speech learnt from speech corpora is advantageous in terms of the amount of musical noise and speech distortion. This paper proposes a new speech enhancement method, in which harmonic structure enhancement is incorporated in learning-based spectral envelope enhancement to further improve performance. The harmonic structure is represented by using a harmonic Gaussian mixture model (GMM), which is parameterized by a voicing indicator and a fundamental frequency. The parameters of the AP model and the harmonic GMM are jointly estimated by maximum a posteriori estimation, thus enabling the enhancement of spectral envelopes and harmonic structures in a unified framework. The proposed method outperforms the spectral envelope enhancement approach by 0.85 dB in cepstral distance.
Takuya Yoshioka, Tomohiro Nakatani, Hiroshi G. Oku