MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

An Investigation of Voice Quality Based on Modifications of the Neutral Vocal Tract Shape

 Brad H. Story and Ingo R. Titze
  
 

Abstract:

Voice quality generally refers to the extralinguistic features of a speaker's voice that may provide cues to identity, personality, health, and emotional state. Such features provide much of the acoustic variability found in speech signals across individuals. A broad description of voice quality would include features contributed by all the subsystems of speech production; i.e. respiratory, phonatory and articulatory systems. To narrow the focus, this paper is concerned with modeling and simulating only those aspects of voice quality that arise due to the shaping of the vocal tract. This work was influenced by Laver [Laver, J. (1980). The Phonetic Description of Voice Quality, Cambridge University Press.], who proposed that long-term "settings" of the vocal tract bias the resulting formant structure toward a particular type of global timbre. He defined two broad categories of vocal tract settings as longitudinal and latitudinal. Longitudinal settings describe the state of the long axis of the vocal tract such as larynx height and protrusion/retraction of the lips. The latitudinal settings are "tendencies to maintain a particular constrictive (or expansive) effect" within some region located along the length of the vocal tract. In this study, the idea is pursued that voice quality can be partially represented by the underlying shape of a speaker's neutral vocal tract, while more generic or canonical movement patterns are used to superimpose the linguistically relevant deformations on the neutral shape. Using an area function model, which allows direct access to the neutral tract shape, four separate modifications were made to one male speaker's vocal tract. The modifications involve the pharyngeal and oral cavities as well as lip aperture and the size of the epi-laryngeal tube. A single word utterance and a sentence were first simulated with the original neutral tract shape and then each was simulated again using the four modified neutral shapes. The modifications are demonstrated with sound files and resulting formant trajectories corresponding to each modification are shown and discussed.

 
 


© 2010 The MIT Press
MIT Logo