MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Bridging the Gap Between Speech Production and Speech Recognition

 John Hogden and Patrick Valdez
  
 

Abstract:

Although stochastic models of speech signals (e.g. hidden Markov models, trigrams, etc) have lead to impressive improvements in speech recognition accuracy, it has been noted that these models have little relationship to speech production, and their recognition performance on some important tasks is far from perfect. However, there have been recent attempts to bridge the gap between speech production and speech recognition using models that are stochastic and yet make more reasonable assumptions about the mechanisms underlying speech production. One of these models, Multiple Observable Maximum Likelihood Continuity Mapping (MO-MALCOM) is described in this paper. There are theoretical and experimental reasons to believe that MO-MALCOM learns a stochastic mapping between articulator positions and speech acoustics, even though no articulator measurements are available during training. Furthermore, MO-MALCOM can be combined with standard speech recognition algorithms to form a speech recognition approach based on a production model. Results of experiments related to MO-MALCOM recovery of articulator positions are summarized. It is pointed out that MO-MALCOM's ability to learn a mapping between acoustics and articulation without a training signal has important theoretical implications for theories of speech perception and speech production For an example of the implications for speech production models, consider that it has been argued that phoneme targets must be acoustic because there is no teaching signal to help learn the mapping between acoustics and tract variables. Such an argument is not valid if a teaching signal is not required to learn the mapping.

 
 


© 2010 The MIT Press
MIT Logo