MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

The CogNet Library : References Collection
mitecs_logo  The Handbook of Multisensory Processes : Table of Contents: Cross-Modal Object Recognition : Introduction
Next »»
 

Introduction

Introduction

Object recognition has traditionally been couched in terms of visual processing. However, in the real world we explore our environment using a variety of modalities, and internal representations of the sensory world are formed by integrating information from different sources. The information arriving through different sensory pathways may be complementary. For object identification, for example, auditory information, such as the moo of a cow, may help to identify the visual entity, the shape of the cow. Furthermore, in order to guide actions and permit interaction with objects, information acquired from the different senses must converge to form a coherent percept.

This chapter reviews the current literature on cross-modal object recognition. Specifically, I consider the nature of the representation underlying each sensory system that facilitates convergence across the senses, and how perception is modified by the interaction of the senses. I will concentrate mainly on the visual and tactile recognition of objects because of my own research interests, although many of the principles of cross-modal—visual and haptic—recognition that I discuss could easily relate to other sensory modalities. The chapter lays out the cortical, behavioral, and experimental correlates of cross-modal recognition. The first section offers some background literature on the neural correlates of cross-modal recognition under conditions of sensory deprivation and multisensory activation. The second section includes a review of the behavioral characteristics of sensory-specific (e.g., visual or tactile) and cross-modal perception. Finally, in the third section I discuss some of our own recent experimental studies on cross-modal recognition of single and multiple objects.

In order to recognize objects, the human visual system is faced with the problem of maintaining object constancy. Specifically, the problem is the following: despite changes in the retinal projection of an object whenever the observer or the object moves, the object representation must remain constant for recognition to occur. In the past, several mechanisms were proposed to allow for object constancy within the visual system. Because our exploration of the environment generally involves more than one modality, however, object constancy could as well be achieved through a multisensory representation of objects. In this way, a change or reduction in information acquired through one sensory modality can be compensated for by information acquired through another modality. Thus, if a cat creeps under a chair and out of sight, it can still be recognized as a cat because of the sound it makes or the way it feels when it rubs against the sitter's feet.

We know very little about how information from different modalities combines to form a single multisensory representation of an object. It might be argued that in order for information to be shared across modalities, the information must be encoded in a similar manner for all modalities—which assumes a functional equivalence among the modalities. That is, although the range and focus of information might be different across the different modalities, the general principles with which information is treated would be the same. For example, vision and haptics can both be seen as image-processing systems, and therefore amenable to similar functional descriptors. However, vision is able to recruit a larger spatial bandwidth in images than the haptics system. Yet as Loomis and others have shown, when the spatial bandwidth of vision is reduced to that of haptics, then letter identification performance is equivalent across both senses (see Loomis, 1990, for a discussion and model). Visual and haptic recognition performance are also more similar when the visual field of view is reduced by placing an aperture over an image of a picture, thus simulating haptic encoding (Loomis, Klatzky, & Lederman, 1991). If both systems are amenable to the same functional processing of image information, then we can extend these properties beyond image resolution and viewing window size. Thus, functional similarities between vision and haptics should be observable using behavioral performance measures. Furthermore, multisensory information about an object must be combined at the cortical level, and should be amenable to measuring using brain imaging techniques such as functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and electrophysiological techniques. These issues are discussed in the following sections.

 
Next »»


© 2010 The MIT Press
MIT Logo