The paper presents a voice conversion framework that can be used in real-time applications. The conversion technique is based on hybrid (deterministic/stochastic) parametric speech representation. The conversion approach has been tested in two modifications for narrow-band and wide-band speech signals. Though the real-time requirement adds some significant limitations (frame by frame processing) the approach provides high quality of the reconstructed speech and recognizability of the target speaker’s identity due to improved feature mapping. The proposed solution is embedded in a mobile communication system as an entertainment service.
Elias Azarov, Alexander A. Petrovsky