| |
Functional organization: category-selective regions
This section describes work that has characterized three distinct regions in the human ventral visual pathway, each of which responds selectively to a single category of visual stimuli (Fig. 79.1).
Figure 79.1..
Three category-selective regions in human extrastriate cortex. The brain images at the left show coronal slices in individual subjects; overlaid in color are regions that responded significantly more strongly to faces than to objects (the FFA), to scenes than to objects (the PPA), and to body parts than to object parts (the EBA). Each region responds to a wide variety of exemplars of the category; for each area, four examples of such preferred stimuli are show in the blue box at the top. Examples of nonpreferred stimuli for each area (that elicit about half the response of preferred stimuli in terms of percent age signal increase from a fixation baseline) are indicated in the red box at the bottom. (See color plate 55.)
Faces
Faces are enormously rich and biologically relevant stimuli, providing information not only about the identity of a person but also about his or her mood, age, sex, and direction of gaze. Indeed, behavioral studies of normal subjects and neurological patients (see Farah, 2000, for a review), as well as event-related potentials in humans (Allison et al., 1999; Bentin et al., 1996) and single-unit recording in monkeys (Perrett et al., 1982; Chapter 78, this volume), provide evidence that face perception engages cognitive and neural mechanisms distinct from those engaged during the recognition of other classes of objects. Several brain imaging studies (e.g., Haxby et al., 1991; Puce et al., 1995, 1996; Sergent et al., 1992) described cortical regions that were most active during viewing of faces. However, these studies did not include the kinds of control conditions that are necessary for testing whether the activated regions are selectively involved in face perception.
Kanwisher et al. (1997) scanned subjects with fMRI while they viewed rapidly presented sequences of faces versus sequences of familiar inanimate objects. We found a region in the fusiform gyrus in most subjects, and a second region in the superior temporal sulcus in about half of the subjects, that produced a stronger MR response during face viewing than object viewing (see also McCarthy et al., 1997). A greater response to faces than to objects could be produced by processes that have nothing to do with face perception per se, including attentional engagement, which may be greater for faces than for nonfaces, a general response to anything animate or anything human, or a response to the low-level visual features present in face stimuli. To test these and other hypotheses, we first identified the candidate face-selective fusiform region individually in each subject with the comparison of faces to objects, and then measured the response in this region of interest (ROI) to a number of subsequent contrasting conditions. After demonstrating that the same region responded at least twice as strongly to faces as to any of the other control stimuli, we concluded that this region is indeed selectively involved in face processing and named it the fusiform face area (FFA) (Fig. 79.1, top, and Fig. 79.2, bottom). The claim that the FFA responds selectively or specifically to faces does not mean that it responds exclusively to faces. Although the FFA responds much more to faces than to objects, it responds more to objects than to a baseline condition such as a fixation point. The standard criterion for neural selectivity (Tovee et al., 1993), adopted here, is that the response must be at least twice as great for the preferred stimulus category as for any other stimulus category.
Figure 79.2..
The top row (adapted from Figure 1 of Wada and Yamamoto, 2001) shows the site of a lesion (outlined in red for greater visibility) that produced a severe deficit in face recognition but not in object recognition. The bottom row shows the author's FFA (color indicates regions responding significantly more strongly during face viewing than object viewing). Note the similarity in the anatomical locus of the lesion and the FFA activation, suggesting that an intact FFA may be necessary for face but not object recognition. (See color plate 56.)
By now, the FFA has been studied extensively in many different experiments and labs. These studies generally agree that the FFA responds more strongly to a wide variety of face stimuli (e.g., front-view photographs of faces, line drawings of faces, cat faces, cartoon faces, and upside-down faces) than to various nonface control stimuli, even when each of these (like faces) constitutes multiple similar exemplars of the same category, including houses (Haxby et al., 1999; Kanwisher et al., 1997), hands (Kanwisher et al., 1997), animals, provided that their heads are not visible (Kanwisher et al., 1999; but see Chao et al., 1999), flowers (McCarthy et al., 1997), or cars (Halgren et al., 1999). These effects are similar when the subject is merely passively viewing the stimuli or carrying out a demanding discrimination task on them (Kanwisher et al., 1997), suggesting that the response does not arise from a greater attentional engagement by faces than by other stimuli. Nor can the FFA response to faces be accounted for in terms of a low-level feature confound, as the response is higher when a face is perceived versus not perceived even when the stimulus is unchanged, as in binocular rivalry (Tong et al., 1998) and face-vase reversals (Hasson et al., 2001).
While the basic response properties of the FFA are generally agreed upon, the function of this region is not. The most basic question is whether the function of the FFA is truly specific to faces or whether it involves a domain-general operation that could in principle be applied to other stimuli (despite being more commonly carried out on faces). For example, in our original paper on the FFA, we suggested testing whether it could be activated by inducing holistic encoding on nonface stimuli. Rossion et al. (2000) found that although attending to whole faces, rather than parts of faces, enhanced the right (but not left) FFA response, attending to whole houses, rather than parts of houses, did not. These data argue against the domain-general holistic encoding hypothesis, instead implicating the right FFA in processing holistic/configural aspects of faces.
Gauthier and her colleagues have argued for a somewhat different domain-general hypothesis, according to which the right FFA is specialized for discriminating between any structurally similar exemplars of a given category for which the subject is an expert (Tarr and Gauthier, 2000). However, most of her evidence is based on studies using novel stimuli called Greebles, a suboptimal choice for testing this hypothesis because they have the same basic configuration as a face (i.e., a symmetrical configuration in which two horizontally arranged parts are above two vertically aligned central parts, as in the configuration of eyes, nose, and mouth). Nonetheless, in one study, Gauthier et al. (1999) found that the FFA was activated by cars in car fanatics and birds in bird experts; this result was replicated by Xu et al. (Xu and Kanwisher, 2001). However, in both studies the effect sizes are small, and the response to faces remains about twice as high as the response to cars in car experts, a result that is consistent with both the face-specificity hypothesis and the subordinate-level-categorization-of-structurally-identical-exemplars-for-which-the-subject-is-expert1 hypothesis. Stronger evidence on this debate comes from a double dissociation in neurological patients: face recognition impairments can be found in the absence of impairments in the expert discrimination of category exemplars (Henke et al., 1998) and vice versa (Moscovitch et al., 1997). These findings argue that different cortical mechanisms are involved in face perception and in the expert visual discrimination of structurally similar category exemplars (Kanwisher, 2000).
If the face specificity of the FFA is granted, the next question is what exactly the FFA does with faces. The FFA appears not to be involved specifically in discriminating the direction of eye gaze, because it is more active during attention to face identity than to gaze direction, while the face-selective region in the superior temporal sulcus responds more strongly in the opposite comparison (Hoffman and Haxby, 2000). Nor is the FFA likely to be specifically involved in extracting emotional expressions from faces, given the consistently high response of the FFA during viewing of expressionless faces. In studies directly manipulating the presence or absence of emotional expressions in face stimuli, the greatest activation is in the amygdala (Breiter et al., 1996) or anterior insula (Phillips et al., 1997), not the fusiform gyrus. Another hypothesis is that the FFA represents semantic rather than perceptual information (Martin and Chao, 2001). However, this too seems unlikely because (1) this region does not respond more to a familiar face, for which semantic information about the individual is available, than to an unfamiliar face, for which it is not (Gorno-Tempini and Price, 2001; Shah et al., 2001), and (2) this region does not appear to represent abstract semantic information about people in general, as it responds no more when subjects read paragraphs describing people than when they read paragraphs describing inanimate objects, though this same comparison produces robust activation in the superior temporal sulcus (R. Saxe and N. Kanwisher, unpublished data). Thus, the FFA appears not to be involved specifically in extracting information about gaze direction or emotional expression, or to be involved in representing semantic information about individual people.
Evidence that this area may be involved in simply detecting the presence of a face comes from the findings that activity in the FFA is strong even for inverted faces (Aguirre et al., 1999; Haxby et al., 1999; Kanwisher et al., 1998) and for line drawings of faces (A. Harris and N. Kanwisher, unpublished data; see also Halgren et al., 1999; Ishai et al., 1999), both of which support easy face detection but not face recognition. However, another study (K. Grill-Spector and N. Kanwisher, unpublished data) found that activity in the right FFA is correlated with both successful detection and successful categorization of faces (versus nonfaces) and in successful discrimination between individual faces, suggesting that it is involved in both of these abilities.
Places
For navigating social primates like humans, one other visual ability is arguably as important as recognizing faces: determining our location in the environment. A region of cortex called the parahippocampal place area (PPA) appears to play an important role in this ability (Epstein and Kanwisher, 1998). The PPA responds strongly whenever subjects view images of places, including indoor and outdoor scenes, as well as more abstract spatial environments such as urban “scenes” made out of Legos, virtual spaces depicted in video games (Aguirre et al., 1996, Maguire et al., 1998), or close-up photographs of desktop scenes (P. Downing, R. Epstein, and N. Kanwisher, unpublished data). Remarkably, the visual complexity and number of objects in the scenes are unimportant; the response is just as high to bare empty rooms (two walls, a floor, and sometimes a door or window) as it is to complex photos of the same rooms completely furnished. The PPA also responds fairly strongly to images of houses cut out from their background (though less than to full scenes), presumably because spatial surroundings are implicit in a depiction of a house. Thus, it is information about the spatial layout of the scene that is apparently critical to the PPA response (Fig. 79.1, middle).
Patients with damage to parahippocampal cortex often suffer from topographical disorientation, an impairment in wayfinding (Aguirre and D'Esposito, 1999; Epstein et al., 2001; Habib and Sirigu, 1987). The core deficit in these patients is an inability to use the appearance of places and buildings for purposes of orientation, perhaps implicating the PPA in place recognition. However, we tested a neurological patient with no PPA and largely preserved place perception but an apparent deficit in learning new place information, suggesting that the PPA may be more critical for encoding scenes into memory than for perceiving them in the first place (Epstein et al., 2001). This possibility is consistent with evidence from other laboratories suggesting that parahippocampal cortex is involved in memory encoding of words (Wagner et al., 1998) and scenes (Brewer et al., 1998).
The PPA is apparently not engaged in processes that rely on knowledge of the specific environment (such as planning a route to a particular location in one's stored cognitive map of the world), as it responds with the same strength to familiar versus unfamiliar places: Epstein et al. (1999) presented MIT students and Tufts University students with scenes from the MIT and Tufts campuses, and found no difference in the response to the same images when they depicted familiar rather than unfamiliar places. Interestingly, however, a significantly higher response was found in the PPA to familiar than to unfamiliar buildings cut out from their background, perhaps because the spatial background was more likely to be inferred in a familiar scene.
One attractive idea is that the PPA may constitute the neural instantiation of a previously hypothesized system for spatial reorientation (Cheng, 1986; Hermer and Spelke, 1994). When disoriented rats and human infants must search for a hidden object, they rely largely on the shape of the local environment to reorient themselves and find the object (but see Gouteux et al., 2001; Learmonth et al., 2001). Strikingly, they completely ignore informative landmark cues such as the location of a salient visual object or feature. This led Cheng and others to hypothesize the existence of a geometric module that represents the shape (but not other features) of surrounding space for the purpose of reorientation. The exclusive use of spatial layout information, and not object/landmark information, is tantalizingly reminiscent of the much greater activation of the PPA by images of spatial layouts than images of objects.
How is the PPA related to the two other neural structures most commonly implicated in spatial encoding and navigation, the hippocampus and the parietal lobe? It has been hypothesized that the hippocampus contains a cognitive map of the animal's environment (O'Keefe and Nadel, 1978). In contrast, the parietal lobe has been implicated in representing the specific spatial information that is relevant to guiding current action. In keeping with this division of labor, physiological recordings in animals indicate that the hippocampus contains allocentric (world-centered) representations of place, whereas the parietal lobes contain egocentric (body-centered) representations of spatial locations (Burgess et al., 1999). For example, place cells in the rat hippocampus respond when the animal is in a specific location in its environment, largely independent of which way the animal is facing, while spatial view cells in the primate hippocampus respond when the animal views a given spatial location (Georges-François et al., 1999). In contrast, neurons in the primate parietal cortex apparently represent space in a number of egocentric coordinates tied to the location of the retina, hand, or mouth (Colby and Goldberg, 1999). A recent study found that fMRI adaptation to repeated stimuli in the PPA occurs only when the same view of a scene is repeated, implicating the PPA in egocentric rather than allocentric representations of space (Epstein et al., 2003).
In sum, although it is now well established that the PPA responds selectively to information about spatial layouts and places, it remains unclear what exactly the PPA does with this information. Critical questions for future research concern the role of the PPA in reorientation and encoding of spatial information into memory, as well as the nature of the interactions between the PPA, the hippocampus, and the parietal lobe.
Bodies
Our latest addition to the set of category-selective regions of cortex is the extrastriate body area (EBA) (Downing et al., 2001). This region responds about twice as strongly when subjects view images depicting human bodies or body parts (nothing too interesting!) as when they view objects or object parts (Fig. 79.1, bottom). The EBA is found in all subjects in the right (and sometimes also the left) lateral occipitotemporal cortex on the lower lip of the posterior superior temporal sulcus, just superior to area MT/MST. The EBA's response profile is unlikely to reflect low-level stimulus confounds, as the same region responded about twice as strongly to body as to nonbody stimuli even when the two stimulus sets were visually similar (e.g., stick figures versus rearranged versions of stick figures that no longer corresponded to body configurations; silhouettes of people versus slightly rearranged silhouettes). Further experiments showed that the EBA does not simply respond to anything living, animate, or known to be capable of motion, or to any object with parts that can move relative to each other: the EBA responds more to human bodies than to trees, mammals, or objects with movable parts such as scissors, staplers, and corkscrews. The one exception to the body specificity of the EBA is the fact that this region responds no more to faces than to objects. As expected from this result, the EBA does not overlap much, if at all, with the face-selective region in the superior temporal sulcus.
At present, the function of the EBA is unknown. It may be involved in recognizing individuals (when their faces are hidden or far away), or in perceiving the body configuration of other people, or even in perceiving the location of one's own body parts. The EBA is suggestively close to area MT, perhaps implicating it in integrating information about body shape and motion (Grossman et al., 2000). The EBA is also close to other regions that have been shown to be activated during social perception, from discriminating the direction of eye gaze, to perceiving or inferring intentions, to perceiving human voices. Thus the EBA may be part of a broader network of nearby areas involved in social perception and social cognition.
What Else?
How many category-selective regions of cortex exist in the human visual pathway? Other categories including animals and tools have been reported to selectively activate focal regions of cortex (Martin and Chao, 2001). However, the evidence is not as strongly established in these cases. When only a few stimuli have been compared, apparent category selectivity must be treated cautiously. For example, we found a region that responded more strongly to chairs than to faces or places, replicating the findings of Ishai et al. (1999), but the same region responded just as strongly to pictures of food, animals, and flowers. In ongoing work in our lab, we have tested well over a dozen categories (P. Downing and N. Kanwisher, unpublished data); so far, we have found no other regions of cortex that exhibit the strong category selectivity typical of the FFA, PPA, and EBA. Thus, it appears that faces, places, and bodies may be unusual in the way they are processed and represented in the cortex. The apparent lack of other category-selective regions of cortex raises the question of how other kinds of objects are represented.
| |