MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Estimation of Vocal Tract Area Function from Magnetic Resonance Imaging: Preliminary Results

 Bernd J. Kröger, Ralf Winkler, Christine Mooshammer and Bernd Pompino-Marschall
  
 

Abstract:

A method has been developed for three-dimensional reconstruction of the vocal tract shape and for the calculation of area function from magnetic resonance imaging (MRI). MR images were acquired using a Philips Gyroscan NT scanner at Radiology of Virchow Klinikum Berlin. A 21 slice series of 3.5 mm thick non-contiguous parallel sagittal sections with inter-image space of the same thickness was gathered per sound. Image acquisition takes not more than 21 seconds per sound. Accordingly only one articulation process per sound was needed. MR images of 6 German long vowels ([i:], [e:], [E:], [a:], [o:], and [u:]) uttered by one subject were collected and analyzed in this study. The analysis procedure comprises three main steps. Firstly the vocal tract airway centerline is calculated using a speaker-dependent but articulation-independent grid. The grid location is based on a rectangle defined by three points, i.e. the midsagittal location of a lower and an upper point of the vertebral column in the cervical region and of the highest point of the palate. Secondly the centerline defines a speaker- and articulation-dependent set of planes, i.e. a set of nearly equidistant planes perpendicular to this centerline. Thirdly the vocal tract air path width has been estimated for each sagittal MR image along straight lines defined by the intersections of the plane of the MR image with each plane of the set. A semi-automatic procedure has been developed for the determination of the air-tissue boundaries and leads to a complete three-dimensional reconstruction of the vocal tract airway. The derived vocalic area functions serve as input for calculation of formant frequencies by using a frequency domain articulatory speech synthesizer. The calculated formant values were compared with those produced by the subject (supine position). The mean difference is around 8.9%, 10.4%, and 6.1% for F1, F2, and F3 for these six vowels. Furthermore a principal component analysis was performed for the vocalic area functions derived from these articulatory data indicating that its variance can be described by few main modes. The cumulative percentage of variance reaches about 71% for one, 89% for two, and 96% for three modes.

 
 


© 2010 The MIT Press
MIT Logo