| |
Abstract:
We would like to develop a more realistic production model of
unvoiced speech sounds, namely fricatives, plosives and
aspiration noise. All three involve turbulence noise generation,
with place-dependent source characteristics that vary with time
(rapidly, in plosives). In this study, we aimed to produce, using
an aero-acoustic model of the vocal-tract filter and source,
voiced as well as unvoiced fricatives that provide a good match
to analyses of speech recordings. The vocal-tract transfer
function (VTTF) was computed by the vocal-tract acoustics
program, VOAC [Davies, McGowan and Shadle. Vocal Fold Physiology:
Frontiers in Basic Science, ed. Titze, Singular Pub., CA, 93-142,
1993], using geometrical data, in the form of cross-sectional
area and hydraulic radius functions, along the length of the
tract. VOAC incorporates the effects of net flow into the
transmission of plane waves through a tubular representation of
the tract, and relaxes assumptions of rrigid walls and isentropic
propagation. The geometry functions were derived from
multiple-slice, dynamic, magnetic resonance images (MRI)
[Mohammad. PhD thesis, Dept. ECS, U. Southampton, UK, 1999;
Shadle, Mohammad, Carter, and Jackson. Proc. ICPhS, S.F. CA,
1:623-626, 1999], using a method of converting from the pixel
outlines that was improved over earlier efforts on vowels. A
coloured noise source signal was combined with the VTTF and
radiation characteristic to synthesize the unvoiced fricative
[s]. For its voiced counterpart [z], many researchers have noted
that the noise source appears to be modulated by voicing.
Furthermore, the phase of the modulation has been shown to be
perceptually significant. Based on our analysis [Jackson and
Shadle. Proc. IEEE-ICASSP, Istanbul, 2000.] of recordings by the
same subject, the frication source of [z] was varied periodically
according to fluctuations in the flow velocity at the
constriction exit, and the modulation phase was governed by the
convection time for the flow perturbation to travel from the
constriction to the obstacle. The synthesized fricatives were
compared to the speech recordings in a simple listening test, and
comparisons of the predicted and measured time series suggested
that the model, which brings together physical, aerodynamic and
acoustic information, can replicate characteristics of real
speech, such as the modulation in voiced fricatives
http://www.isis.ecs.soton.ac.uk/research/projects/nephthys/
.
|