| |
Introduction
Introduction
For readers of a book on multimodal perception, it probably comes as no surprise that most events in real life consist of perceptual inputs in more than one modality and that sensory modalities may influence each other. For example, seeing a speaker provides not only auditory information, conveyed in what is said, but also visual information, conveyed through the movements of the lips, face, and body, as well as visual cues about the origin of the sound. Most handbooks on cognitive psychology pay comparatively little attention to this multimodal state of affairs, and the different senses—seeing, hearing, smell, taste, touch—are treated as distinct and separate modules with little or no interaction. It is becoming increasingly clear, however, that when the different senses receive correlated input about the same external object or event, information is often combined by the perceptual system to yield a multimodally determined percept.
An important issue is to characterize such multisensory interactions and their cross-modal effects. There are at least three different notions at stake here. One is that information is processed in a hierarchical and strictly feed-forward fashion. In this view, information from different sensory modalities converges in a multimodal representation in a feed-forward way. For example, in the fuzzy logic model of perception (Massaro, 1998), degrees of support for different alternatives from each modality—say, audition and vision—are determined and then combined to give an overall degree of support. Information is propagated in a strictly feed-forward fashion so that higher-order multimodal representations do not affect lower-order sensory-specific representations. There is thus no cross-talk between the sensory modalities such that, say, vision affects early processing stages of audition or vice versa. Cross-modal interactions in feed-forward models take place only at or beyond multimodal stages. An alternative possibility is that multimodal representations send feedback to primary sensory levels (e.g., Driver & Spence, 2000). In this view, higher-order multimodal levels can affect sensory levels. Vision might thus affect audition, but only via multimodal representations. Alternatively, it may also be the case that cross-modal interactions take place without multimodal representations. For example, it may be that the senses access each other directly from their sensory-specific systems (e.g., Ettlinger & Wilson, 1990). Vision might then affect audition without the involvement of a multimodal representation (see, e.g., Falchier, Clavagnier, Barone, & Kennedy, 2002, for recent neuroanatomical evidence showing that there are projections from primary auditory cortex to the visual area V1).
The role of feedback in sensory processing has been debated for a long time (e.g., the interactive activation model of reading by Rumelhart and McClelland, 1982). However, as far as cross-modal effects are concerned, there is at present no clear empirical evidence that allows distinguishing among feed-forward, feedback, and direct-access models. Feed-forward models predict that early sensory processing levels should be autonomous and unaffected by higher-order processing levels, whereas feedback or direct-access models would, in principle, allow that vision affects auditory processes, or vice versa. Although this theoretical distinction seems straightforward, empirical demonstration in favor of one or another alternative has proved difficult to obtain. One of the main problems is to find measures that are sufficiently unambiguous and that can be taken as pure indices of an auditory or visual sensory process.
Among the minimal requirements for stating that a cross-modal effect has perceptual consequences at early sensory stages, the phenomenon should at least be (1) robust, (2) not explainable as a strategic effect, and (3) not occurring at response-related processing stages. If one assumes stagewise processing, with sensation coming before attention (e.g. the “late selection” view of attention), one might also want to argue that (4) cross-modal effects should be pre-attentive. If these minimal criteria are met, it becomes at least likely that cross-modal interactions occur at early perceptual processing stages, and thus that models that allow access to primary processing levels (i.e., feedback or direct-access models) better describe the phenomenon. In our work on cross-modal perception, we investigated the extent to which such minimal criteria apply to some cases of audiovisual perception. One case concerns a situation in which vision affects the localization of a sound (the ventriloquism effect), the other a situation in which an abrupt sound affects visual processing of a rapidly presented visual stimulus (the freezing phenomenon). In Chapter 36 of this volume, we describe the case of cross-modal interactions in affect perception. Each of these phenomena we consider to be based on cross-modal interactions affecting early levels of perception.
| |