| |
The instrumental analysis of speech can be approached through the three main stages of the speech chain: articulatory, acoustic, and auditory phonetics. This entry reviews the instrumentation used to assess the articulatory and acoustic phases. Auditory phonetic techniques are covered elsewhere in this volume.
Although speech planning in the brain (neurophonetics) lies outside the traditional tripartite speech chain, the neurological aspects of both speech production and perception can be studied through the use of brain imaging techniques. Speech articulation proper is deemed here to begin with the movement of muscles required to produce aerodynamic changes resulting in the flow of an airstream (see Laver, 1994; Ball and Rahilly, 1999). Of course, muscle movements also occur throughout articulation. This area has been investigated using electromyography (EMG). In EMG, electrodes of different types (surface, needle, and hooked wire) are used to gather data on electrical activity within target muscles, and these data are matched with a simultaneously recorded speech signal (Stone, 1996; Gentil and Moore, 1997). In this way the timing of muscle activity in relation to different aspects of speech can be investigated. This technique has been used to examine both normal and disordered speech. Areas studied include the respiratory and laryngeal muscles, muscle groups in the lips, tongue, and soft palate, and various disorders, including disorders of voice and fluency, and certain acquired neurological problems. Figure 1 shows EMG traces from a patient with Friedreich's ataxia.
Figure 1..
Averaged integrated EMG signals for the mentalis muscle (MENT), orbicularis oris inferior (OOI), orbicularis oris superior (OOS), anterior belly of the digastric (ABD), and the depressor labii inferior (DLI) for a patient with Friedreich's ataxia uttering /epapap/. (Courtesy of Michèle Gentil.)
Aerodynamic activity in speech is studied through aerometry. A variety of devices have been used to measure speech aerodynamics (Zajac and Yates, 1997). Many systems have employed an airtight mask that is placed over the subject's face and attached to a pneumotachograph. The mask contains sensors to measure pressure changes and airflow at the nose and mouth, and generally also a microphone to record the speech signal, against which the airflow can be plotted. If the focus of attention is lung volume changes, then a plethysmograph may be employed. This is an airtight box that houses the subject, and any changes to the air pressure within the box (caused by changes in the subject's lung volume) are recorded. A simpler plethysmograph (the respitrace; see Stone, 1996) consists of a wire band placed around the subject's chest that measures changes in cross-sectional area during inhalation and exhalation.
In normal pulmonic egressive speech, airflow from the lungs passes through the larynx, where a variety of phonation types may be implemented. The study of laryngeal activity (more particularly, of vocal fold activity) can be direct or indirect. In direct study, a rigid or flexible endoscope connected to a camera is used to view the movements of the folds. This technology is often coupled to a stroboscopic light source, as stroboscopic endoscopy allows the viewer to see individual movements of the folds (see Abberton and Fourcin, 1997). Endoscopy, however, is invasive, and use of a rigid endoscope precludes normal speech. Indirect investigation of vocal fold activity is undertaken with electroglottography (EGG), also termed electrolaryngography (see Stone, 1996; Abberton and Fourcin, 1997). This technique allows vocal fold movement to be extrapolated from measuring the varying electrical resistance across the larynx. Both approaches have been used in the investigation of normal and disordered voice.
Velic action and associated differences in oral and nasal airflow (and hence in nasal resonance) can also be measured directly or indirectly. The velotrace is an instrument designed to indicate directly the height of the velum (see Bell-Berti et al., 1993), while nasometric devices of varying sophistication measure oral versus nasal airflow (see Zajac and Yates, 1997). The velotrace is invasive, as part of the device must be inserted into the nasal cavity to sit on the roof of the velum. Nasometers measure indirectly using, for example, two external microphones to measure airflow differences. Figure 2 shows a trace from the Kay Elemetrics nasometer of hypernasal speech.
Figure 2..
Trace adapted from a Kay Elemetrics nasometer showing normal and hypernasal versions of “eighteen, nineteen, twenty.”
The next step in the speech production chain is the articulation of sounds. Most important here is the placement of the individual articulators, and electropalatography (EPG) has proved to be a vital development in this area of study. Hardcastle and Gibbon (1997) describe this technique. A thin acrylic artificial palate is made to fit the subject. This palate has a large number of electrodes embedded in it (from 62 to 96, depending on the system employed) to cover important areas for speech (e.g., the alveolar region). When the tongue touches these electrodes, they fire, and the resultant tongue-palate contact patterns can be shown on a computer screen. The electrodes are normally sampled 100 times per second, and the patterns are displayed in real time. This allows the technique to be used both for research and for feedback in therapy. EPG has been used to study normal speech and a wide range of disordered speech patterns. Figure 3 shows tongue-palate contact patterns in a stylized way for two different EPG systems.
Figure 3..
Reading EPG3 system stylized palate diagram (left) showing misarticulated /s/ with wide channel; Kay Palatometer stylized system palate diagram (right) showing target /s/ articulated at the postalveolar region.
Other ways of examining articulation (and indeed a whole range of speech-related activity) can be subsumed under the overall heading of speech imaging (see Stone, 1996; Ball and Gröne, 1997). The oldest of these techniques is x-radiography. A variety of different x-ray techniques have been used in speech research, among them videofluorography, which uses low doses of radiation to give clear pictures of the vocal tract, and x-ray microbeam imaging, in which the movements of pellets attached to relevant points of the tongue and palate are tracked. Because of the dangers of radiation, alternative imaging techniques have been sought. Among these is ultrasound, which uses the time taken for sound waves to bounce off a structure and return to a receiver to map structures in the vocal tract. Because ultrasound waves do not travel through the air, mapping of the tongue (from below) is possible, but mapping of tongue–palate distances is not, as the palate cannot be mapped through the air space of the oral cavity. Electromagnetic articulography (EMA) is another tracking technique. In this technique the subject is placed within alternating magnetic fields generated by transmitter coils in a helmet assembly. Small receiver coils are placed at articulatorily important sites (e.g., tongue tip, tongue body). The movements of the receiver coils through the alternating magnetic fields are measured and recorded by computer. As with x-ray microbeam imaging, the tracked points can be used to infer the shape and movements of articulators within the vocal tract.
The final imaging technique to be considered is magnetic resonance imaging (MRI). The imager surrounds a subject with electromagnets, creating an electromagnetic field. This field causes hydrogen protons (abundant in human tissue) to align but also to precess, or wobble. If a brief radio pulse is introduced at the same frequency as the precessing, the protons are moved out of alignment and then back again. As they realign, they emit weak radio signals, which can be used to construct an image of the tissue involved. MRI can provide good images of the vocal tract but currently not at sufficient frequency to allow analysis of continuous speech. All of these imaging techniques have been used to study aspects of both normal and disordered speech. Figure 4 shows ultrasound diagrams for two vowels and two consonants.
Figure 4..
Ultrasound images of two vowels and two consonants. (Courtesy of Maureen Stone.)
Acoustic analyses via sound spectrography are now easily undertaken with a range of software programs on personal computers as well as on dedicated hardware-software configurations such as the Kay Elemetrics Sonagraph. Reliable analysis depends on good recordings (see Tatham and Morton, 1997). Spectrographic analysis packages currently allow users to analyze temporal, frequency, amplitude, and intensity aspects of a speech signal (see Baken and Daniloff, 1991; Farmer, 1997). For example, a waveform displays amplitude versus time; a wideband spectrogram displays frequency versus time using a wideband pass filter (around 200–300 Hz), giving good time resolution but poor frequency resolution; a narrow-band spectrogram shows frequency versus time using a narrowband pass filter (around 29 Hz), which provides good frequency resolution but poor time resolution; and spectral envelopes show frequency versus intensity at a point in time (produced either by fast Fourier transform or linear predictive coding). Speech analysis research has generally concentrated on wideband spectrograms and spectral envelopes. These both show formant frequencies (bands of high intensity at certain frequency levels), which are useful in the identification of and discrimination between vowels and other sonorants. Fricatives are distinguishable from the boundaries of the broad areas of frequency seen clearly on a spectrogram, while plosives can be noted from the lack of acoustic activity during the closure stage and the coarticulatory effects on the formants of neighboring sounds. Segment duration can easily be measured from spectrograms in modern analysis packages. Various pitch extraction algorithms are provided for the investigation of intonation. Farmer (1997) provides an extensive review of acoustic analysis work in a range of disorders: voice, fluency, aphasia, apraxia and dysarthria, child speech disorders, and the speech of the hearing-impaired. Figure 5 shows a wideband spectrogram of disfluent speech.
Figure 5..
Wideband spectrogram of a disfluent speaker producing “(provin)cial t(owns).” (Courtesy of Joan Rahilly.)
| |