MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Acoustic-to-articulatory Inversion Using Dynamical and Phonological Constraints

 Sorin Dusan and Li Deng
  
 

Abstract:

A well-known difficulty in using the articulatory representation for applications in the areas of speech coding, synthesis and recognition is the poor accuracy in the estimation of the articulatory parameters from the acoustic signal of speech. The difficulty is especially serious for most classes of consonantal sounds. This paper presents a statistical method of estimating the articulatory trajectories from the speech signal based on training databases of physiological measurements of articulatory and acoustic parameters obtained from continuous speech utterances. The estimation of articulatory trajectories uses the extended Kalman filtering technique and is based on new linguistic constraints imposed to acoustic-to-articulatory inversion. These new constraints are mainly implemented by dividing the whole articulatory-acoustic function into a number of phonological sub-functions, each corresponding to a unit of speech defined as the patterns of the continuous transition between two consecutive phonemes. A state-space dynamical model has been used to represent each phonological unit of speech. A different articulatory-acoustic sub-function has been modeled as a part of the state-space model for each phonological unit of speech. An automatic method of segmenting the speech signal and recognizing the phonological units was developed based on likelihood computation from Kalman filtering with different models. The final estimation of articulatory trajectories was obtained from Kalman smoother using the parameters of the recognized models. The whole speech inversion method was developed using synthesized speech data obtained with an articulatory synthesizer. Then the method was evaluated on real speech data recorded with an articulograph and an X-ray microbeam system. Estimation results compared to articulographic and X-ray speech data are presented in this paper. Average RMS errors of about 2 mm have been obtained between estimated and actual articulatory trajectories.

 
 


© 2010 The MIT Press
MIT Logo