MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

The CogNet Library : References Collection
mitecs_logo  The MIT Encyclopedia of Communication Disorders : Table of Contents: The Singing Voice : Section 1
Next »»
 

The functioning of the voice organ in singing is similar to that in speech. Thus, the origin of the sound is the voice source—the pulsating airflow through the glottis. The voice source is mainly controlled by three physiological factors, subglottal pressure, length and stiffness of the vocal folds, and the degree of glottal adduction. These control parameters determine vocal loudness, F0, and mode of phonation, respectively. The voice source is a complex tone composed of a series of harmonic partials of amplitudes decreasing by about 12 dB per octave as measured in flow units. It propagates through the vocal tract and is thereby filtered in a manner determined by its resonance or formant frequencies. These frequencies are determined by the vocal tract shape. For most vowel sounds, the two lowest formant frequencies determine vowel quality, while the higher formant frequencies belong to the personal voice characteristics.

Breathing

Subglottal pressure determines vocal loudness and is therefore used for expressive purposes in singing. It is also varied with F0, such that higher pitches are sung with higher subglottal pressures than lower pitches. As a consequence, singers need to vary subglottal pressure constantly, adapting it to both loudness and pitch. This is in sharp contrast to speech, where subglottal pressure is much more constant. Singers therefore need to develop a virtuosic control of the breathing apparatus. In addition, subglottal pressures in singing are varied over a larger range than in speech. Thus, while in loud speech subglottal pressure may be raised to 1.5 or 2 kPa, singers may use pressures as high as 4 or 6 kPa for loud tones sung at high pitches.

Subglottal pressure is determined by active forces produced by the breathing muscles and passive forces produced by gravity and the elasticity of the breathing apparatus. Elasticity, generated by the lungs and the rib cage, varies with lung volume. At high lung volumes, elasticity produces an exhalatory force that may amount to 3 kPa or more. At low lung volumes, elasticity contributes an inhalatory force. Whereas in conversational speech, no more than about 15%–20% of the total lung capacity is used, classically trained singers use an average lung volume range that is more than twice as large and occasionally may vary from 100% to 0% of the total vital capacity in long phrases.

As the elasticity forces change from exhalatory at high lung volumes to inhalatory at low lung volumes, they reach an equilibrium at a certain lung volume. This lung volume is called the functional residual capacity (FRC). In tidal breathing, inhalations are started from FRC. In both speech and singing, lung volumes above FRC are preferred. Because much higher lung volumes are used in singing than in speech, singers need to deal with much greater exhalatory elasticity forces.

Voice Source

The airflow waveform of the voice source is characterized by quasi-triangular pulses, produced when the vocal folds open the glottis, and followed by horizontal portions near or at zero airflow, produced when the folds close the glottis more or less completely (Fig. 1).

Figure 1..  

Typical flow glottogram showing transglottal airflow versus time.


The acoustic significance of the waveform is straightforward. The slope of the source spectrum is determined mainly by the negative peak of the differentiated flow waveform, frequently referred to as the maximum flow declination rate. It represents the main excitation of the vocal tract. This steepness is linearly related to the subglottal pressure in such a way that a doubling of subglottal pressure causes an SPL increase of about 10 dB. The amplitude gain of higher partials is greater than that of lower partials. Thus, if the sound level of a vowel sound is increased by 10 dB, the partials near 3 kHz typically increase by about 17 dB.

The air volume contained in a flow pulse is decisive to the amplitude of the source spectrum fundamental and is strongly influenced by the overall glottal adduction force. Thus, for a given subglottal pressure, a firmer adduction produces a smaller air volume in a pulse, which reduces the amplitude of the fundamental. An exaggerated glottal adduction thus attenuates the fundamental. This phonation mode is generally referred to as hyperfunctional or pressed. The opposite extreme—that is, the habitual use of too faint adduction—is called hypofunctional and prevents the vocal folds from closing the glottis also during the vibratory cycle. As a result, airflow escapes the glottis during the quasi-closed phase. This generates noise and produces a strong fundamental. This phonation mode is often referred to as breathy.

In classical singing, pressed phonation is typically avoided. Instead, singers seem to strive to reduce glottal adduction to the minimum that will still result in glottal closure during the closed phase. This generates a source spectrum with strong high partials and a strong fundamental. This type of phonation has been called flow phonation or resonant voice. In nonclassical singing, on the other hand, pressed phonation is occasionally used for high, loud tones, apparently for expressive purposes.

A main characteristic of classical singing is the vibrato. It corresponds to a quasi-periodic modulation of F0 (Fig. 2). The pitch perceived from a vibrato tone corresponds to its average F0. The modulation frequency, mostly between 5 and 7 Hz, is generally referred to as the vibrato rate and is rather constant for a singer. Curiously enough, however, it tends to increase somewhat toward the end of tones. The peak-to-peak modulation range is varied between nil and less than two semitones, or F0 · 21/6. With increasing age, singers' vibrato rates tend to decrease by about one-half hertz per decade of years, and vibrato extent tends to increase by about 15 cent per decade.

Figure 2..  

Example of vibrato.


The vibrato is generated by a rhythmical pulsation of the cricothyroid muscles. When contracting, these muscles cause a stretching of the vocal folds, and so raise F0. The neural origin of this pulsation is not understood. One possibility is that it emanates from a cocontraction of the cricothyroid and vocalis muscles.

In speech, pitch is perceived in a continuous fashion, such that a continuous variation in F0 is heard as a continuous variation of pitch. In music, on the other hand, pitch is perceived categorically, where the categories are scale tones or the intervals between them. Thus, the F0 continuum is divided logarithmically into a series of bins, each of which corresponds to a scale tone. The width of each scale tone is approximately 6% wide, and the center frequency of a scale tone is 21/12 higher than its lower neighbor.

The demands for pitch accuracy are quite high in singing. Experts generally find that a tone is out of tune when it deviates from the target F0 by more than about 7 cent, or 0.07 of a semitone interval. This corresponds to less than one-tenth of a typical vibrato extent. The target F0 generally agrees with equal-tempered tuning, where the interval between adjacent scale tones corresponds to the F0 ratio of 1:21/12. However, apparently depending on the musical context, the target F0 for a scale tone may deviate from its value in equal-tempered tuning by about a tenth of a semitone interval.

Resonance

The formant frequencies in classical singing differ between voice classifications. Thus, basses have lower formant frequencies than baritones, who have lower formant frequencies than tenors. These differences probably reflect differences in vocal tract length. The formant frequencies also deviate from those typically found in speech. For example, the second formant of the vowel [i] is generally considerably lower in classical singing than in speech, such that the vowel quality approaches that of the vowel [y].

These formant frequency deviations are related to the singer's formant, a marked spectrum envelope peak between approximately 2.5 and 3 kHz, that appears in all voiced sounds produced by classically trained male singers and altos (Fig. 3). It is produced by a clustering of F3, F4, and F5. This clustering seems to be achieved by combining a narrow opening of the larynx tube with a wide pharynx. If the area ratio between the larynx tube opening and the pharynx approximates 1:6 or less, the larynx tube acts as a separate resonator in the sense that its resonance frequency is rather insensitive to the cross-sectional area in the remaining parts of the vocal tract. Its resonance frequency can be somewhere between 2.5 and 3 kHz. If this resonance is appropriately tuned, it will provide a formant cluster.

Figure 3..  

Spectra of the vowel [u] as spoken and sung by a classically trained baritone singer.


A common method to achieve a wide pharynx seems to be to lower the larynx, which is typically observed in classically trained singers. Lowering the larynx lengthens the pharynx cavity. As F2 of the vowel [i] is mainly dependent on the pharynx length, it will be lowered by a lowering of the larynx. In nonclassical singing, more speechlike formant frequencies are used, and no singer's formant is produced.

The center frequency of the singer's formant varies between voice classifications. On average, it tends to be about 2.4, 2.6, and 2.8 kHz for basses, baritones, and tenors, respectively. These differences, which contribute significantly to the characteristic voice qualities of these classifications, probably reflect differences in vocal tract length.

The singer's formant spectrum peak is particularly prominent in bass and baritone singers. In tenors and altos it is less prominent and in sopranos it is generally not observable. Thus, sopranos do not seem to produce a singer's formant.

The singer's formant seems to serve the purpose of enhancing the voice when accompanied by a loud orchestra. The long-term-average spectrum of a symphonic orchestra typically shows a peak near 0.5 kHz followed by a descent of about 9 dB per octave toward higher frequencies (Fig. 4). Therefore, the sound level of an orchestra is comparatively low in the frequency region of the singer's formant, so that the singer's formant makes the singer's voice easier to perceive. As the singer's formant is produced mainly by vocal tract resonance, it can be regarded as a manifestation of vocal economy. It does not appear in nonclassical singing, where the soloist is provided with a sound amplification system that takes care of audibility problems. Also, it is absent or much less prominent among choral singers, whose voices are supposed to blend, such that individual singers' voices are difficult to discern.

Figure 4..  

Long-term-average spectrum of an orchestra with and without a tenor soloist (heavy solid and dashed curves). The thin solid curve shows a rough approximation of a corresponding analysis of neutral speech at conversational loudness.


The approximate pitch range of a singer is about two octaves. Typical ranges for basses, baritones, tenors, altos, and sopranos are E2–E4 (82–330 Hz), G2–G4 (98–392 Hz), C3–C5 (131–523 Hz), F3–F5 (175–698 Hz), and C4–C6 (262–1047 Hz), respectively. This implies that F0 is often higher than the typical value of F1 in some vowels. Singers, however, seem to avoid the situation in which F0 is higher than F1. Instead, they increase F1 so that it is always higher than F0. For the vowel [a], this is achieved by widening the jaw opening; the higher the pitch, the wider the jaw opening. For other vowels, singers seem first to reduce the tongue constriction of the vocal tract, and resort to a widening of the jaw opening when the effect of this neutralization of the articulation fails to produce further increase of F1.

Because F1 and F2 are decisive to the perception of vowels, the substantial departures from the typical formant frequency values in speech affect vowel identification. Yet vowel identification is surprisingly successful also at high F0. Most isolated vowel sounds can be correctly identified up to an F0 of about 500 Hz. Above this frequency, identification deteriorates quickly and remains low for most vowels at F0 higher than 700 Hz. In reality, however, text intelligibility can be greatly improved by consonants.

Health Risks

Because singers are extremely dependent on the perfect functioning of their voices, they often need medical attention. A frequent origin of their voice disorders is a cold, which typically causes dryness of the glottal mucosa. This disturbs the normal function of the vocal folds. Also relevant would be their use of high subglottal pressures. An inappropriate vocal technique, sometimes associated with a habitually exaggerated glottal adduction or with singing in a too high pitch range, also tends to cause voice disorders, which in some cases may lead to developing vocal nodules. Such nodules generally disappear after voice rest, and surgical treatment is mostly considered inappropriate.

See also voice acoustics.

 
Next »»


© 2010 The MIT Press
MIT Logo