Trainable Videorealistic Speech Animation

15 years 10 months ago

Download cbcl.mit.edu

We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned. The two key contributions of this paper are 1) a variant of the multidimensional morphable model (MMM) to synthesize new, previously unseen mouth co...

Tony Ezzat, Gadi Geiger, Tomaso Poggio

Real-time Traffic

Biometrics | FGR 2004 | Human Subject | Speech Animation Module | Visual Speech Module |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	FGR
Authors	Tony Ezzat, Gadi Geiger, Tomaso Poggio

Comments (0)

Sciweavers

Trainable Videorealistic Speech Animation

Biometrics | FGR 2004 | Human Subject | Speech Animation Module | Visual Speech Module |

Explore & Download

Productivity Tools

Sciweavers