Audio-visual anticipatory coarticulation modeling by human and machine

15 years 1 months ago

Download faculty.wcas.northwestern.edu

The phenomenon of anticipatory coarticulation provides a basis for the observed asynchrony between the acoustic and visual onsets of phones in certain linguistic contexts. This type of asynchrony is typically not explicitly modeled in audio-visual speech models. In this work, we study within-word audiovisual asynchrony using manual labels of words in which theory suggests that audio-visual asynchrony should occur, and show that these hand labels confirm the theory. We then introduce a new statistical model of audio-visual speech, the asynchronydependent transition (ADT) model. This model allows asynchrony between audio and video states within word boundaries, where the audio and video state transitions depend not only on the state of that modality, but also on the instantaneous asynchrony. The ADT model outperforms a baseline synchronous model in mimicking the hand labels in a forced alignment task, and its behavior as parameters are changed conforms to our expectations about anticipa...

Louis H. Terry, Karen Livescu, Janet B. Pierrehumb

Real-time Traffic

Asynchrony | Audio-visual Speech | INTERSPEECH 2010 | Model | Signal Processing |

claim paper

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Louis H. Terry, Karen Livescu, Janet B. Pierrehumbert, Aggelos K. Katsaggelos

Sciweavers

Audio-visual anticipatory coarticulation modeling by human and machine

Asynchrony | Audio-visual Speech | INTERSPEECH 2010 | Model | Signal Processing |

Explore & Download

Productivity Tools

Sciweavers