MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Towards an Audiovisual Virtual Talking Head: 3D Articulatory Modeling of Tongue, Lips and Face Based on MRI and Video Images

 Pierre Badin, Pascal Borel, Gérard Bailly, Lionel Revéret, Monica Baciu and Christoph Segebarth
  
 

Abstract:

For a very long time, articulatory modelling of vocal tract and speech production organs has been essentially limited to the mere midsagittal plane, which posed problems in particular for modelling lateral consonants and determining area functions. Thanks to the increasing availability of Magnetic Resonance Imaging (MRI) devices and of image processing means, it has become possible to acquire 3D vocal tract articulatory data with reasonable acquisition speed. MR images of the tongue, and front and profile video images of the subject's face marked with small beads have thus been recorded for one subject producing the French oral vowels, and all non nasal French consonants artificially sustained in three symmetric contexts [a i u]. The modelling approach was then based on a decomposition in linear components of the coordinates defining the 3D geometry of the different organs extracted from these data. Specifically, it consisted in alternating PCA that delivers optimal factors explaining the maximum of data variance with a minimum number of factors, and Linear Component Analysis, where factors are arbitrarily imposed by the user. The advantage of choosing arbitrarily some of the factors lies the possibility of using direct articulatory measures (i.e. jaw height) as a factor, or to control the nature and repartition of the variance explained by a factor (to make it more interpretable in terms of control parameter for a model), though at the cost of a sub-optimal variance explanation. Six parameters were found to control the tongue: Jaw Height, Tongue Body, Tongue Dorsum, Tongue Advance, two Tongue Tip parameters, with an overall RMS reconstruction error of 0.16 cm. It was also found that the lips and face can be driven by five parameters: the same Jaw Height, Jaw Advance, Lip Protrusion, Lip Height, and Lip vertical elevation, leading to an RMS reconstruction error of 0.1 cm. Nomograms corresponding to these parameters are exemplified in the text and fully provided in the additional files. The tongue model allowed in particular to take care of the lateral consonants. These models have been finally integrated into the ICP virtual talking head that will be demonstrated at the conference.

Full version completed with video illustrations is available here

 
 


© 2010 The MIT Press
MIT Logo