Being able to animate a speech production model with articulatory data would open applications in many domains. In this paper, we first consider the problem of acquiring articulatory data from non invasive image and sensor modalities: dynamic US images, stereovision 3D data, electromagnetic sensors and MRI. We here especially focus on automatic registration methods which enable the fusion of the articulatory features in a common frame. We then derive articulatory parameters by fitting these features with Maeda’s model. To our knowledge, it is the first attempt to derive articulatory parameters from features automatically extracted and registered between the modalities. Results prove the soundness of the approach and the reliability of the fused articulatory data.
M. Aron, Asterios Toutios, M.-O. Berger, E. Kerrie