| |
Abstract:
The range of speech sounds that can be produced by a speaker
depends on the range of different shapes that his vocal tract can
assume. Although the description of the articulatory limits can
be easily determined from physiological considerations, it is not
straightforward to assess the consequences of those limits in
terms of acoustics. The region in the acoustical space filled by
all the possible sounds of a given speaker is called the "maximal
space". This concept is of crucial interest in many
speech-related studies, such as the prediction of vocalic
systems, interspeaker normalization, and the development of
speech from the infancy to the adulthood.
As regards vowel production, the maximal space has been
traditionally described in terms of the first 3 or 4 formants.
Some past studies tried to infer the maximal formant space based
on quite simple physiological considerations. Bonder (1982)
obtained some estimations using a stylized vocal tract modeled as
a concatenation of 4 tubes. Atal, Chang, Mathews, & Tuckey
(1978) generated a large codebook using a simplified articulatory
model and covering all the maximal vowel space. Thanks to the
introduction of an articulatory model, Boe, Perrier, Guerin &
Schwartz (1989) could present a covering region of the formant
space by systematic exploration of the articulatory commands.
In spite of the progress achieved in these studies, we still
lack a principled way to obtain a description of the boundaries
of the maximal space. As a matter of fact, the results in the
above cited studies suffer from two problems: either they do not
use realistic models of speech production, or they fail to give
an explanatory account for the obtained boundaries.
The work described in this paper addresses those issues by
proposing a method for systematically obtaining the limits of the
maximal space based on the exploration of an articulatory model.
We are using the model proposed by Maeda (1989), in which seven
articulatory commands (jaw, tongue body, tongue dorsum, larynx
height and lip protrusion and closing) determine the formants
produced (the 3 first in our case). Respecting the natural
restrictions of minimal constriction along the vocal tract and of
the allowed range of the articulatory parameters, an optimization
procedure is used to reach the boundary of the maximal space from
a set of different articulatory configurations. The quantity that
is either maximized or minimized here is the acoustical length of
the vocal tract, defined as the effective length of a traveling
plane wavefront inside the curved vocal tract. In order to obtain
a representative amount of points on the boundary, we constrain
the optimization in certain directions in the formant space.
|