MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

The CogNet Library : References Collection
mitecs_logo  The Handbook of Multisensory Processes : Table of Contents: Audiovisual Speech Binding: Convergence or Association? : Introduction
Next »»
 

Introduction

Introduction

Over the past several decades, behavioral studies in experimental psychology have revealed several intriguing audiovisual (AV) speech perception effects. Under noisy acoustic conditions, being able to see a talker results in substantial gains to comprehending speech, with gains estimated to be equivalent to raising the acoustic signal-to-noise ratio by approximately 11 dB (MacLeod & Summerfield, 1987; Sumby & Pollack, 1954).1 When speech is severely degraded by filtering out various frequency bands, being able to see the talker results in significant restoration of speech information (Grant & Walden, 1996). Extremely minimal auditory speech information can combine with visible speech information to produce superadditive levels of performance. For example, if an acoustic signal encoding only the voice fundamental frequency is presented, word recognition is impossible. But the combination of the talker's face and voice fundamental frequency results in dramatic enhancements over lipreading alone (e.g., 30% words correct with lipreading alone versus 80% words correct when the fundamental frequency is added; Boothroyd, Hnath-Chisolm, Hanin, & Kishon-Rabin, 1988; Breeuwer & Plomp, 1985; Kishon-Rabin, Boothroyd, & Hanin, 1996).

The AV effect that has been of greatest interest, and for which a substantial perception literature now exists, is obtained with a variety of incongruent (mismatched) pairings of auditory and visual spoken syllables (e.g., Green & Kuhl, 1989; Massaro, 1987; Massaro, Cohen, & Smeele, 1996; Munhall, Gribble, Sacco, & Ward, 1996; Saldana & Rosenblum, 1994; Sekiyama, 1997; Walker, Bruce, & O'Malley, 1995). When incongruent AV stimuli are presented, frequently the speech stimulus that is perceived is not equivalent to what is perceived in the absence of the visual stimulus (Green, 1998; McGurk & MacDonald, 1976). An example of this so-called McGurk effect occurs when an auditory /ba/ is dubbed to a visual /ga/, and listeners report hearing /da/. In addition to the blend effect just cited, combination percepts arise, such as /bga/ in response to auditory /ga/ and visual /ba/. Other effects have been obtained with incongruent syllables, several of which are described later in this chapter.

The speech perception literature offers several alternative perceptual theories to account for AV effects. Conveniently, recent developments in functional brain imaging and recordings of cortical event-related potentials (ERPs) and fields afford the opportunity to investigate whether there is support for perceptual theories at the level of neural implementation. For example, theoretical accounts suggesting that AV speech integration is early, at a subsegmental (subphonemic) level, could be taken to imply that auditory and visual speech information combines in the central nervous system (CNS) at early levels of the cortical synaptic hierarchy and/or at early latencies. That is, the linguistic level at which perceptual effects are theorized to occur can suggest the relevant level of CNS processing. Alternatively, the brain mechanisms responsible for speech processing are complex and nonlinear, and the expectation that the translation between perceptual theory and neural implementation could be simple is likely to be overly optimistic (see Friston et al., 1996; Picton, Alain, Otten, Ritter, & Achim, 2000). In addition, considerable leeway is afforded by perceptual theories in the translation to possible theories of neural implementations. This chapter outlines AV speech perception research and theories, adopting a fundamental explanatory distinction that has parallels at the level of plausible neural mechanisms for binding auditory and visual speech information. The chapter also outlines those plausible mechanisms. The goal of the chapter is to provide a particular integrated view of the perceptual and neural levels of explanation.

 
Next »»


© 2010 The MIT Press
MIT Logo