| |
Abstract:
Contextual coarticulatory effects, such as carry-over and
anticipatory movements, are characteristic features of continuous
speech utterances. Representation of these contextual effects is
a major issue in task-oriented trajectory formation of
articulatory movements. This paper presents a novel method of
representing phoneme-specific articulatory targets (phonemic
tasks) and a dynamic articulatory model for generating
articulatory movements from specified phonemic tasks. Phonemic
tasks are formally defined using invariant features of
articulatory posture, such as movements making vocal-tract
constrictions or relative movements among articulators reflecting
task-sharing structures, which are consistent and less variable
across utterance conditions. The invariant feature is a linear
transformation that minimizes the criterion, i.e., the ratio of
within-class articulatory variation to the total variation. By
solving a generalized eigenvalue problem, constructed using the
covariance matrices of articulatory data, it is possible to
obtain the phoneme invariant features that represent consistent
articulatory gestures during the production of the phoneme. In
the trajectory formation of articulatory movements, there remain
unconstrained kinematic degrees-of-freedom of articulatory
variables since the dimension of the phonemic task is smaller
than that of articulatory variables. These redundant components
are resolved using dynamic constraints representing smoothly
moving behavior of the articulators, and articulatory movements
are determined so that it satisfies given phonemic tasks and
dynamic constraints simultaneously. Based on this framework, our
model can explain contextual articulatory variability using
context-independent phonemic tasks, since articulatory behavior
corresponding to the redundant components are organized so that
it smoothly interpolates targets of the adjacent phonemes.
Although phonemes exhibit consistent articulatory behaviors when
they are articulated by the lips or tip of the tongue, it is
rather difficult to find such consistencies for back vowels and
velar consonants due to the contextual variability of the tongue
body. Therefore, this paper further investigates the use of
allophonic targets for these phonemes to achieve an accuracy
representation of contextual articulatory movements. By allowing
a small number of context-sensitive variations of the
articulatory target, automatic extraction of allophonic targets
is investigated on the basis of the clustering of an articulatory
data set and the assignment of every triphonic context to one of
these clusters. In the generation of articulatory movements,
these allophonic targets are switched based on the match between
input phoneme context and the triphonic contexts assigned to each
cluster. Finally, investigations are performed for the
quantitative evaluation of the accuracy of the articulatory model
in predicting articulatory movements.
|