HMM-based automatic visual speech segmentation using facial data

14 years 9 months ago

Download hal.inria.fr

We describe automatic visual speech segmentation using facial data captured by a stereo-vision technique. The segmentation is performed using an HMM-based forced alignment mechanism widely used in automatic speech recognition. The idea is based on the assumption that using visual speech data alone for the training might capture the uniqueness in the facial component of speech articulation, asynchrony (time lags) in visual and acoustic speech segments and significant coarticulation effects. This should provide valuable information that helps to show the extent to which a phoneme may affect surrounding phonemes visually. This should provide information valuable in labeling the visual speech segments based on dominant coarticulatory contexts.

Utpala Musti, Asterios Toutios, Slim Ouni, Vincent

Real-time Traffic

INTERSPEECH 2010 | Signal Processing | Speech Segments | Visual Speech | Visual Speech Segmentation |

claim paper

» A 3D AudioVisual Corpus of Affective Communication

» Fusion of audio and visual cues for laughter detection

» Story Segmentation in News Videos Using Visual and Text Cues

» Recent developments in visual sign language recognition

» Augmented segmentation and visualization for presentation videos

» Overview of VideoCLEF 2009 New Perspectives on SpeechBased Multimedia Content Enrichment

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Utpala Musti, Asterios Toutios, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger

Comments (0)

Sciweavers

HMM-based automatic visual speech segmentation using facial data

INTERSPEECH 2010 | Signal Processing | Speech Segments | Visual Speech | Visual Speech Segmentation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers