Reconstructing the full contour of the tongue from the position of 3 to 4 landmarks on it is useful in articulatory speech work. This can be done with submillimetric accuracy using nonlinear predictive mappings trained on hundreds or thousands of contours extracted from ultrasound images. Collecting and segmenting this amount of data from a speaker is difficult, so a more practical solution is to adapt a well-trained model from a reference speaker to a new speaker using a small amount of data from the latter. Previous work proposed an adaptation model with only 6 parameters and demonstrated fast, accurate results using data from one speaker only. However, the estimates of this model are biased, and we show that, when adapting to a different speaker, its performance stagnates quickly with the amount of adaptation data. We then propose an unbiased adaptation approach, based on local transformations at each contour point, that achieves a significantly lower reconstruction error with a mo...
Chao Qin, Miguel Á. Carreira-Perpiñ&