| |
Introduction
Introduction
Cochlear implants (CIs) are electronic auditory prostheses developed to enable individuals with severe to profound hearing impairment to perceive speech and understand spoken language. A CI consists of an external microphone and speech processor, which convert sound into an electrical signal, and an external transmitter, which transmits the electrical signal to an internal receiver, which then sends the electrical signal to an array of electrodes (located in the cochlea) that stimulate the auditory nerve. Although CIs work well in many patients, the benefits to individual users vary substantially. Auditory-alone (A-alone) performance measures have demonstrated that some users of CIs are able to communicate successfully using speech over a telephone even when lipreading cues are unavailable (e.g., Dorman, Dankowski, McCandless, Parkin, & Smith, 1991). Other users show little benefit in open-set speech perception tests under A-alone listening conditions but report that the CI helps them understand speech when visual information is also available in face-to-face conversation.
One source of these individual differences is undoubtedly the way in which the surviving neural elements in the cochlea are stimulated with electrical currents provided by the speech processor (Fryauf-Bertschy, Tyler, Kelsay, Gantz, & Woodworth, 1997). Other sources of individual differences, however, result from the way in which these initial sensory inputs are coded and processed by higher cortical centers in the auditory system (Pisoni, 2000; Pisoni, Cleary, Geers, & Tobey, 2000). In this chapter, we present a review and theoretical interpretation of new findings on the central cognitive factors related to audiovisual (AV) speech perception that contribute to the individual differences in outcome and benefit with CIs in profoundly deaf adults and children.
Although there are many important clinical reasons for investigating the basis for the variation and variability in outcome measures of speech and language among deaf adults and children who have received CIs, there are also several theoretical reasons, having to do with issues in neural and behavioral development, for carrying out research in this unique population. Deaf adults and children who receive CIs afford an unusual opportunity to study the effects of auditory deprivation on sensory, perceptual, and cognitive development, specifically the cognitive mechanisms that underlie the development of speech and language processing skills after the introduction of novel auditory input to the nervous system. For ethical reasons, it is not possible to carry out sensory deprivation experiments with adults and children, and it is impossible to delay or withhold medical treatment for an illness or disability that has been identified and diagnosed. Thus, for studies of this kind in humans, it becomes necessary to rely on clinical populations that are receiving medical interventions of various kinds and hope that appropriate experimental designs can be developed that will yield new scientific knowledge and understanding about the basic underlying neural mechanisms and processes.
Fortunately, in everyday experience, speech communication is not limited to input from the auditory sensory modality alone. Visual information about speech articulation and spoken language obtained from lipreading has been shown to improve speech understanding in listeners with normal hearing (Erber, 1969; Sumby & Pollack, 1954), listeners with hearing loss (Erber, 1972, 1975), and hearing-impaired listeners with CIs (Tyler, Parkinson, Woodworth, Lowder, & Gantz, 1997). Although lipreading cues have been shown to enhance speech perception, the sensory, perceptual, and cognitive processes underlying the multisensory gain in performance are not well understood, especially in special clinical populations, such as adults and children who are hearing-impaired and subsequently acquire or reacquire hearing via CIs.
In one of the first studies to investigate AV speech perception, Sumby and Pollack (1954) demonstrated that visual cues to speech greatly enhance speech intelligibility for normal hearing (NH) listeners, especially when the acoustic signal is masked by noise. They found that performance on several closed-set word recognition tasks increased substantially under AV presentation compared to A-alone presentation. This increase in performance was comparable to the gain observed when the auditory signal was increased by about 15 dB SPL under A-alone perception conditions (Summerfield, 1987). Since this research was reported almost 50 years ago, numerous other studies have demonstrated that visual information from lipreading improves speech perception performance over A-alone conditions in NH adults (Massaro & Cohen, 1995) as well as in adults and children with varying degrees of hearing impairment (Erber, 1975, 1979; Geers, 1994; Geers & Brenner, 1994; Geers, Brenner, & Davidson, 2003; Grant, Walden, & Seitz, 1998; Massaro & Cohen, 1999).
Cross-modal speech perception and the cognitive processes by which individuals combine and integrate auditory and visual speech information with lexical and syntactic knowledge have become major areas of research in the field of speech perception (e.g., Massaro, 1998; Massaro & Cohen, 1995). AV speech perception appears to be much more complicated than just the simple addition of auditory and visual cues to speech (e.g., Massaro & Cohen, 1999). That is, the gain in performance obtained from combined AV information is superadditive. For example, Sumby and Pollack (1954) and Erber (1969) observed that although speech intelligibility decreased in A-alone conditions when the signal-to-noise ratio (S/N) decreased, word recognition dramatically increased under AV conditions, with the visual contribution increasing in importance when the speech signal was less audible.
A well-known example of visual bias in AV speech perception is the McGurk effect (McGurk & MacDonald, 1976). When presented with an auditory /ba/ stimulus and a visual /ga/ stimulus simultaneously, many listeners report hearing an entirely new stimulus, a perceptual /da/. Thus, information from separate sensory modalities is combined to produce percepts that differ predictably from either the auditory or the visual signal alone. However, these findings are not observed in all individuals (see Massaro & Cohen, 2000). Grant and Seitz (1998) suggested that listeners who are more susceptible to the McGurk effect also are better at integrating auditory and visual speech cues. Grant et al. (1998) proposed that some listeners could improve consonant perception skills by as much as 26% by sharpening their integration abilities. Their findings on AV speech perception may have important clinical implications for intervention and aural rehabilitation strategies with deaf and hearing-impaired listeners because consonant perception accounted for approximately half of the variance of word and sentence recognition in their study.
In an important theoretical paper on AV speech perception, Summerfield (1987) identified two major reasons to study AV integration in speech perception. First, studies of AV integration allow researchers to investigate the independent and collaborative contribution of individual auditory and visual modalities. Second, AV integration occurs to varying degrees in all perceivers. Even NH listeners benefit from lipreading cues when they are required to recognize speech in background noise or under other degraded listening conditions. Moreover, many hearing-impaired listeners depend heavily on lipreading cues for everyday communication purposes without the availability of reliable auditory information to support speech perception. Thus, the study of cross-modal speech perception and AV integration may provide speech researchers with new insights into the fundamental cognitive processes used in spoken language processing.
Summerfield (1987) proposed five mechanisms that could account for AV speech perception. These mechanisms ranged from the “vision:place, audition:manner” or VPAM hypothesis, in which place evidence is categorized by the visual modality and manner evidence is categorized by the auditory modality before the information is integrated, to a modality-free dynamical representation of AV speech perception. At the present time, the debate concerning the underlying neural and perceptual mechanisms of AV integration is unresolved, with some researchers supporting modality-neutral, early-integration theories of AV speech perception (e.g., Fowler, 1986; Rosenblum, in press; Rosenblum & Gordon, 2001) and others arguing for modality-specific, late-integration theories (e.g., Bernstein, Auer, & Moore, Chap. 13, this volume) of AV speech perception. Our recent findings, reported later in this chapter, on deaf adults and children who received CIs add new knowledge and may inform this debate by determining the relative contributions of visual and severely compromised auditory modalities when the auditory modality is restored with a CI.
Because postlingually deaf adults have to readjust their perceptual strategies to compensate for the loss of hearing or degraded auditory input, and may have to rely more heavily on visual cues to spoken language than NH adults, it seems reasonable to suppose that hearing loss alone would naturally lead to increased lipreading abilities. However, research investigating this hypothesis has led to inconclusive results. Some investigators have found no lipreading advantage for hearing-impaired adults (e.g., Massaro, 1987; Mogford, 1987; Rönnberg, 1995), whereas other researchers have found better lipreading performance in some deaf adults than in NH adults (Bernstein, Demorest, & Tucker, 2000). In a recent study, Bernstein, Auer, and Tucker (2001) reported that NH adults who were self-selected as “good lipreaders” could actually lipread words in sentences just as well as deaf adults. However, the deaf adults' lipreading accuracy was superior to that of NH adults only when the sentences were scored in terms of phonemes correct, which reflects the use of partial stimulus information.
One reason for the equivocal results on this issue is that there is enormous variability in the lipreading skills of the deaf adult population. Such variability in deaf adults' lipreading skills may be explained by the age at onset of the hearing loss: adults who experienced a hearing loss earlier in life may be better lipreaders than adults who acquired a hearing loss later in life, simply because they have had more exposure and experience with the visual properties of speech articulation and were forced to actively code and process visual speech cues to engage in meaningful speech communication activities on a daily basis. In fact, it has been reported that many deaf adults with early-onset deafness are better lipreaders than NH adults (see Bernstein et al., 2000). However, there has been very little research conducted on this problem or the underlying factors that are responsible for these individual differences in lipreading skills.
In a study on the effects of age at onset of deafness, Tillberg, Rönnberg, Svärd, and Ahlner (1996) measured speech perception in A-alone, visual-alone (V-alone), and AV modalities in adults with early-onset versus late-onset deafness and found no differences in performance between the two groups in A-alone and AV conditions. However, they did find better performance for adults with early-onset deafness than for adults with late-onset deafness in the V-alone condition. The differences in lipreading performance between the early-onset and late-onset groups most likely were due to experience and activities with visual speech cues. That is, adults with an early-onset hearing loss spend a greater proportion of their lives using visual rather than auditory cues to perceive speech, whereas adults with a late-onset hearing loss depend primarily on auditory cues to speech for a longer time and thus have much less lipreading experience overall.
Among audiologists and speech and hearing scientists who work with the deaf, it is generally assumed that the primary modality for speech perception is different for hearing-impaired and NH populations (e.g., Erber, 1969; Gagné, 1994; Seewald, Ross, Giolas, & Yonovitz, 1985). The primary modality for speech perception is audition for NH people, but it is vision for hearing-impaired people. In fact, CIs were historically thought of as sensory aids to lipreading, with vision considered to be the primary modality for speech perception and spoken language in this population (Gantz et al., 1988; Tyler et al., 1985; Tyler, Tye-Murray, & Lansing, 1988). Early single-channel CIs provided minimal auditory information about speech and spoken language and coded only duration and amplitude, with little if any of the fine spectral detail needed for phonetic perception and spoken word recognition (Gantz, Tye-Murray, & Tyler, 1989). However, even the addition of subthreshold auditory signals or gross acoustic cues such as duration, amplitude, and fundamental frequency significantly improves speech recognition scores over lipreading alone (e.g., Boothroyd, Hnath-Chisolm, Hanin, & Kishon-Rabin, 1988; Breeuwer & Plomp, 1986; Erber, 1969, 1979).
As CI technology has advanced over the years, so have the speech perception abilities of hearing-impaired listeners who have received CIs (Fryauf-Bertschy, Tyler, Kelsay, & Gantz, 1992; Fryauf-Bertschy et al., 1997; Miyamoto, Kirk, et al., 1997; Miyamoto, Svirsky, & Robbins, 1997; Waltzman et al., 1997). If deaf adults who depend primarily on the visual sensory channel for speech perception are better lipreaders than NH adults, what happens to their lipreading abilities once the auditory channel is restored via a CI after a period of deafness? Does reliance on the primary sensory channel change and reorganize following implantation? It could be that adult CI recipients still depend primarily on visual speech cues and merely supplement this information with the auditory input from the implant. It is also conceivable that postlingually deaf adults have the opportunity to revert back to the auditory channel as their primary input for speech perception, but whether this occurs or not will depend on the nature of the sensory information provided by the CI.
Studies of postlingually deaf adults have assumed that AV integration ability is intact prior to onset of deafness. How did adults acquire this ability in the first place? Are hearing and vision both initially necessary to be able to integrate A and V information? It has been commonly thought that postlingually deaf children and adults without an auditory communication channel must use lipreading as their primary communication channel, but relatively little is actually known about the lipreading skills of prelingually, profoundly deaf children who have never experienced sound or had any exposure to speech or spoken language via the auditory sensory modality prior to receiving a CI.
A number of years ago, Seewald et al. (1985) reported that differential reliance on either audition or vision as the primary modality for speech perception was related to level of hearing impairment in prelingually deaf children. When faced with conflicting auditory and visual information in a speech perception test, children who had higher hearing thresholds were biased toward the visual input, whereas children with lower hearing thresholds were biased toward auditory input. More recent research on prelingually deaf children with CIs has shown that combined AV information in tests of spoken language recognition improves performance over A-alone and V-alone conditions (Geers, 1994; Geers & Brenner, 1994; Geers et al., 2003; Staller, Dowell, Beiter, & Brimacombe, 1991; Tyler, Fryauf-Bertschy, et al., 1997; Tyler, Opie, Fryauf-Bertschy, & Gantz, 1992). But how do profoundly deaf children acquire lipreading skills in the first place when they have received little if any auditory input during the critical period for language development? How do congenitally deaf children recover the articulatory gestures of the talker and use that source of information to recognize words and understand spoken language?
In this chapter, we review recent findings on the AV speech perception skills of deaf adults and children who have received CIs, and provide an interpretation of the results in terms of perceptual learning and the effects of experience on perceptual and linguistic development. Our findings from studies of both postlingually deaf adults and prelingually deaf children offer some new insights into the development of multimodal speech perception after hearing has been restored via a CI. These results also contribute to the current theoretical debate about the nature of the processes underlying AV speech perception.
| |