It is well known that the classical linear predictive model for speech fails to take into account the quasi-periodic nature of the glottal flow typical of voiced speech. In this article we describe how to incorporate an estimate of the glottal flow directly into the traditional linear prediction framework, through the use of flexible basis function expansions that admit efficient estimation procedures. As we show, this not only allows for improved estimation of vocal tract transfer function parameters in a manner that is robust to pitch variation, but also precludes the need for nonlinear optimization procedures typically required in glottal waveform estimation. We illustrate our approach with experiments using synthesized and real speech waveforms, and show how it may be used to directly estimate the relative degree of voicing and aspiration present in a given utterance.
Maria A. Berezina, Daniel Rudoy, Patrick J. Wolfe