—This paper describes a fully automated framework to generate realistic head motion, eye gaze, and eyelid motion simultaneously based on live (or recorded) speech input. Its cent...
Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of fea...
Abdel-rahman Mohamed, Tara N. Sainath, George Dahl...
We describe a system for separating multiple sources from a two-channel recording based on interaural cues and prior knowledge of the statistics of the underlying source signals. ...
Ron J. Weiss, Michael I. Mandel, Daniel P. W. Elli...
We discuss an identification framework for noisy speech mixtures. A block-based generative model is formulated that explicitly incorporates the time-varying harmonic plus noise (H...
In this paper we address the application of single sensor source separation techniques to mixtures of speech and music. Three strategies for source modeling are presented, namely ...