| |
Abstract:
For a very long time, articulatory modelling of vocal tract and
speech production organs has been essentially limited to the mere
midsagittal plane, which posed problems in particular for modelling
lateral consonants and determining area functions. Thanks to the
increasing availability of Magnetic Resonance Imaging (MRI) devices
and of image processing means, it has become possible to acquire 3D
vocal tract articulatory data with reasonable acquisition speed. MR
images of the tongue, and front and profile video images of the
subject's face marked with small beads have thus been recorded for
one subject producing the French oral vowels, and all non nasal
French consonants artificially sustained in three symmetric
contexts [a i u]. The modelling approach was then based on a
decomposition in linear components of the coordinates defining the
3D geometry of the different organs extracted from these data.
Specifically, it consisted in alternating PCA that delivers optimal
factors explaining the maximum of data variance with a minimum
number of factors, and Linear Component Analysis, where factors are
arbitrarily imposed by the user. The advantage of choosing
arbitrarily some of the factors lies the possibility of using
direct articulatory measures (i.e. jaw height) as a factor, or to
control the nature and repartition of the variance explained by a
factor (to make it more interpretable in terms of control parameter
for a model), though at the cost of a sub-optimal variance
explanation. Six parameters were found to control the tongue: Jaw
Height, Tongue Body, Tongue Dorsum, Tongue Advance, two Tongue Tip
parameters, with an overall RMS reconstruction error of 0.16 cm. It
was also found that the lips and face can be driven by five
parameters: the same Jaw Height, Jaw Advance, Lip Protrusion, Lip
Height, and Lip vertical elevation, leading to an RMS
reconstruction error of 0.1 cm. Nomograms corresponding to these
parameters are exemplified in the text and fully provided in the
additional files. The tongue model allowed in particular to take
care of the lateral consonants. These models have been finally
integrated into the ICP virtual talking head that will be
demonstrated at the conference.
Full version completed with video illustrations is available here
|