| |
Abstract:
A method has been developed for three-dimensional
reconstruction of the vocal tract shape and for the calculation
of area function from magnetic resonance imaging (MRI). MR images
were acquired using a Philips Gyroscan NT scanner at Radiology of
Virchow Klinikum Berlin. A 21 slice series of 3.5 mm thick
non-contiguous parallel sagittal sections with inter-image space
of the same thickness was gathered per sound. Image acquisition
takes not more than 21 seconds per sound. Accordingly only one
articulation process per sound was needed. MR images of 6 German
long vowels ([i:], [e:], [E:], [a:], [o:], and [u:]) uttered by
one subject were collected and analyzed in this study. The
analysis procedure comprises three main steps. Firstly the vocal
tract airway centerline is calculated using a speaker-dependent
but articulation-independent grid. The grid location is based on
a rectangle defined by three points, i.e. the midsagittal
location of a lower and an upper point of the vertebral column in
the cervical region and of the highest point of the palate.
Secondly the centerline defines a speaker- and
articulation-dependent set of planes, i.e. a set of nearly
equidistant planes perpendicular to this centerline. Thirdly the
vocal tract air path width has been estimated for each sagittal
MR image along straight lines defined by the intersections of the
plane of the MR image with each plane of the set. A
semi-automatic procedure has been developed for the determination
of the air-tissue boundaries and leads to a complete
three-dimensional reconstruction of the vocal tract airway. The
derived vocalic area functions serve as input for calculation of
formant frequencies by using a frequency domain articulatory
speech synthesizer. The calculated formant values were compared
with those produced by the subject (supine position). The mean
difference is around 8.9%, 10.4%, and 6.1% for F1, F2, and F3 for
these six vowels. Furthermore a principal component analysis was
performed for the vocalic area functions derived from these
articulatory data indicating that its variance can be described
by few main modes. The cumulative percentage of variance reaches
about 71% for one, 89% for two, and 96% for three modes.
|