MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Dynamic Articulatory Model Based on Multidimensional Invariant-feature Task Representation

 Tokihiko Kaburagi and Masaaki Honda
  
 

Abstract:

Contextual coarticulatory effects, such as carry-over and anticipatory movements, are characteristic features of continuous speech utterances. Representation of these contextual effects is a major issue in task-oriented trajectory formation of articulatory movements. This paper presents a novel method of representing phoneme-specific articulatory targets (phonemic tasks) and a dynamic articulatory model for generating articulatory movements from specified phonemic tasks. Phonemic tasks are formally defined using invariant features of articulatory posture, such as movements making vocal-tract constrictions or relative movements among articulators reflecting task-sharing structures, which are consistent and less variable across utterance conditions. The invariant feature is a linear transformation that minimizes the criterion, i.e., the ratio of within-class articulatory variation to the total variation. By solving a generalized eigenvalue problem, constructed using the covariance matrices of articulatory data, it is possible to obtain the phoneme invariant features that represent consistent articulatory gestures during the production of the phoneme. In the trajectory formation of articulatory movements, there remain unconstrained kinematic degrees-of-freedom of articulatory variables since the dimension of the phonemic task is smaller than that of articulatory variables. These redundant components are resolved using dynamic constraints representing smoothly moving behavior of the articulators, and articulatory movements are determined so that it satisfies given phonemic tasks and dynamic constraints simultaneously. Based on this framework, our model can explain contextual articulatory variability using context-independent phonemic tasks, since articulatory behavior corresponding to the redundant components are organized so that it smoothly interpolates targets of the adjacent phonemes. Although phonemes exhibit consistent articulatory behaviors when they are articulated by the lips or tip of the tongue, it is rather difficult to find such consistencies for back vowels and velar consonants due to the contextual variability of the tongue body. Therefore, this paper further investigates the use of allophonic targets for these phonemes to achieve an accuracy representation of contextual articulatory movements. By allowing a small number of context-sensitive variations of the articulatory target, automatic extraction of allophonic targets is investigated on the basis of the clustering of an articulatory data set and the assignment of every triphonic context to one of these clusters. In the generation of articulatory movements, these allophonic targets are switched based on the match between input phoneme context and the triphonic contexts assigned to each cluster. Finally, investigations are performed for the quantitative evaluation of the accuracy of the articulatory model in predicting articulatory movements.

 
 


© 2010 The MIT Press
MIT Logo