| |
Interpreting local depth measurements: the contrast depth asymmetry principle
In this section we discuss how occlusion constrains the interpretation of local depth estimates. Specifically, we show that occlusion enforces a crucial asymmetry between relatively near and relatively distant structures that can have profound implications for the representation of surface layout. Although the principles are discussed in terms of binocular disparity, the fundamental logic relates to the geometry of occlusion and therefore applies to any local estimate of depth.
Binocular Stereopsis and the Correspondence Problem
Binocular stereopsis is the most thoroughly studied source of information about depth. Binocular depth perception relies on the fact that the two eyes receive slightly different views of the same scene. Because of the horizontal parallax between the two views, a given feature in the world often projects to two slightly different locations on the two retinas (Fig. 86.1). These small differences in retinal location, or binocular disparities, vary systematically with distance in depth from the point of convergence and can thus be used to triangulate depth. For a thorough treatment of stereopsis, see Howard and Rogers (1995) and Chapter 87.
Figure 86.1..
a, The two eyes converge by angle α on a point P. Therefore, by definition, P projects to the foveae of both eyes (P′). The Vieth-Müller circle is one of the geometrical horopters, that is, it traces a locus of points in space that project to equivalent retinal locations in the two eyes and thus carry no interocular disparity. Point Q is closer to the observer than P (as it falls inside the horopter). Therefore, it projects to different locations on the two retinas (Q′). The difference in the locations of Q′ is the binocular disparity, which can be scaled by the vergence angle, α, to derive depth. b, When the visual field contains many points, there is a potential ambiguity concerning which image features correspond in the two eyes. Correct matches yield correct depth estimates, such as dA. c, By contrast, false matches yield erroneous depth estimates. Here, the image of point A has been incorrectly matched with the image of point B, leading to an incorrect depth estimate, d*.
In order to determine the disparity of a feature in the world, the visual system must localize that feature in the two retinal images. Once it has identified matching image features, the difference in retinal location is the binocular disparity, which can then be scaled to estimate depth. The visual system must not measure the disparity between features that do not belong together; otherwise, it will derive spurious depth estimates (Fig. 86.1). Because of this, the accuracy of the matching process is critical to binocular depth perception. The problem of identifying matching features in the two eyes' views (i.e., features that originate from a common source in the world) is known as the correspondence problem.
If the features that the visual system localizes in the two images are very simple, such as raw intensity values (or pixels), then in principle there could be many distracting features that do not in reality share a common origin in the world. Under these conditions the correspondence problem would be difficult, as the visual system would have to identify the one true match from among a large number of false targets.
However, there is considerable debate about what types of image features the visual system matches to determine disparity (Jones and Malik, 1992; Julesz 1960, 1971; Marr and Poggio, 1976, 1979; Pollard et al., 1985; Prazdny, 1985; Sperling, 1970). Psychophysically, at least, it now seems unlikely that the visual system matches raw luminances. Rather, the visual system seems to match local contrast signals, that is, localizable variations in intensity, such as luminance edges (Anderson and Nakayama, 1994; Smallman and McKee, 1995). This seems an almost inevitable consequence of early visual processing, which maximizes sensitivity to contrasts rather than to absolute luminances (Cornsweet, 1970; Hartline, 1940; Ratliff, 1965; Wallach, 1948). By the time binocular information converges in V1, the visual field appears to be represented in terms of local measurements of oriented contrast energy (De Valois and DeValois, 1988; Hubel and Wiesel, 1962), and thus it is likely that these are the features from which disparity is computed.
If this is true, then the image features that carry disparity information are local contrasts such as luminance edges. However, this poses a problem for the visual system, for in order to capture the functional units of the environment, the visual representation of depth should be tied to surfaces and objects, not to local image features. There is therefore a potential discrepancy between the image features that carry disparity information (i.e., local contrasts) and the perceptual structures to which depth is assigned (i.e., regions) in the ultimate representation of environmental layout. This discrepancy plays a critical role in the theoretical discussion that follows (see Anderson, in press).
A local image feature, such as an edge, has only one true match in the other eye's image. Therefore, the edge carries only one disparity. However, depth is ultimately assigned to the two regions that meet to form the edge. This results in a problem: in order to represent surface structure, the visual system must assign depth to both sides of an edge, even though the edge carries only one disparity (Fig. 86.2). How does the visual system infer the depths of two regions from every local disparity signal? We will show that the geometry of occlusion imposes an inviolable constraint on the interpretation of local disparity-carrying features. To anticipate, we show that the simple fact that near surfaces can occlude more distant ones, but not vice versa, has profound consequences for the assignment of depth to whole regions.
Figure 86.2..
a, The image of a square occluding a diamond. A receptive field of limited extent (the ellipse) captures only local information about the scene, here a vertical luminance edge. This local information is ambiguous, as many different scenes could have resulted in the same image feature. b, If disparity is calculated by matching local contrasts, then the edge carries only a single disparity. However, in this case, the light and dark sides of the edge result from two distinct objects, and therefore different depths have to be assigned to the two sides of the edge.
Asymmetries in Depth: A Demonstration
By way of motivation for the theoretical discussion that follows, consider Figure 86.3, which is based on a figure developed by Takeichi et al. (1992). The figure consists of a Kanizsa illusory triangle and three diamonds. When disparity places the diamonds closer to the observer than the triangle and inducers (by cross-fusing the stereopair on the left of Fig. 86.3), the diamonds appear to float independently in front of the background, and the Kanizsa triangle tends to be seen as a figure in front of the circular inducers; this percept is schematized in Figure 86.3B. The disparities in the display can be inverted simply by swapping the left and right eyes' views, as can be seen by cross-fusing the stereopair on the right side of Figure 86.3. In this case, what was previously distant becomes near and vice versa, such that the diamonds are placed behind the plane of the inducers. In both versions of the display, the triangle itself carries no disparity relative to the circular inducers; only the disparity of the diamonds changes from near to far. This simple inversion leads to a change in surface representation that is more complex than a simple reversal in the depth ordering of the perceptual units (as schematized in Fig. 86.3B). When the diamonds recede, they drag their background back with them, such that the triangle appears as a hole through which the observer can see a white surface; the three black diamonds lie embedded in the more distant white surface. This recession of the background has a secondary effect of increasing the strength of the illusory contour (the border of the triangle).
Figure 86.3..
Asymmetries in depth interpolation. a, When the left stereopair is cross-fused, the diamonds appear to float independently in front of the Kanizsa triangle, as schematized in b. When the disparity of the diamonds is inverted (by cross-fusing the right stereopair), the diamonds drag their background with them, creating the percept of a triangular hole, even though only the disparity of the diamonds has changed (c). This asymmetrical change in surface structure can be explained by the contrast depth asymmetry principle (see text). (Adapted from Takeichi et al., 1992.)
The important observations with regard to the theory are the following. First, when the diamonds are in front, they are freely floating and separate, while when they recede, they drag the background with them. Second, when the diamonds are forward, the Kanizsa triangle tends to be seen as a figure (rather than ground), but when the diamonds are more distant, the triangle is seen as a hole. And yet all that changed in the display was the disparity of the diamonds. Why does this simple reversal in depth lead to an asymmetric change in the surface representation? Why does the disparity of the diamonds influence the appearance of the triangle? These are the asymmetries of depth to which the following discussion pertains.
From Features to Surfaces: Interpretation of Local Disparity Signals
Let us assume that the visual system has located a luminance edge and derived a disparity, d0, from that edge. What possible surface configurations are consistent with the local disparity measurement? Broadly, the legal interpretations fall into two classes, as shown in Figure 86.4. The first class consists of surface events in which both sides of the edge meet at the depth of the edge, d0. There are many surface events for which this is the case: reflectance edges, cast shadows, and creases in the surface, to name just three. When the feature originates from a continuous manifold, as in these cases, interpretation is simple, as both sides of the edge are assigned the same depth, d0.
Figure 86.4..
A contour which carries a depth signal (e.g., disparity) is inherently ambiguous. Two main classes of world states could have given rise to the contour: the contour could have originated from a single continuous surface (e.g., a reflectance edge or cast shadow), or it could have originated from an occlusion event. In the occlusion case, the border ownership of the contour (i.e., which side is the occluder) is ambiguous. Nonetheless, in all configurations, both sides of the contour are constrained to be at least as far as the depth signal carried by the contour. This introduces a fundamental asymmetry in the role of near and far contours in determining surface structure (see text for details). (Adapted from Anderson, in press; see also Anderson et al., 2002.)
The second class of interpretations occurs when the edge corresponds to an object boundary and therefore represents a depth discontinuity (Fig. 86.4). In this case, one side of the edge lies at the depth of the occluding object, and the other side of the edge lies at the depth of the background. Therefore, the visual system must assign different depths to the two sides of the edge. How can the visual system assign two depths, when it is given only one disparity, d0? The answer is that it only assigns a unique depth to the occluding side. The critical insight is the following: the depth measurement acquired at an occluding edge only specifies the depth of the occluding surface. The visual system assigns depth d0 to the occluding surface. All that it knows about the other side is that it must be more distant than the occluding surface. If the more distant surface is untextured, then it could be at any depth behind the occluder and the local image data would remain the same. By contrast, if the depth of the occluding surface varies, the disparity carried by the object boundary must also change, because the occluding surface “owns” the contour (Koffka, 1935; Nakayama et al., 1989) and is therefore responsible for the disparity associated with the edge.
Although the visual system cannot uniquely derive the depth of the occluded side (i.e., the background) from the local disparity computation, there is one critical piece of information that it does have: the occluded side is more distant than the occluder. There is no way for an occluding object to be more distant than the background that it occludes. If the background is brought closer than the object, then the background becomes the occluding surface and carries the edge with it. In this way, occlusion introduces a fundamental asymmetry into the interpretation of disparity-carrying edges: the occluded side of the edge can be at any distance greater than d0, but neither side can be nearer than d0.
We can summarize the possible depth assignments (from the occlusion and nonocclusion classes just described) in the form of a constraint on the interpretation of local disparity-carrying contrasts, termed the contrast depth asymmetry principle (Anderson, in press; see also Anderson et al., 2002):
Both sides of an edge must be situated at a depth that is greater than or equal to the depth carried by that edge.
Although this geometric fact is simple in form, it can have pronounced effects on the global interpretation of images when the constraint applies to all edges simultaneously. We will now run through an example to show how the principle can explain the asymmetric changes in perceived surface structure that occur when near and far disparities are inverted.
Application of the Contrast Depth Asymmetry Principle
In order to demonstrate the explanatory power of the contrast depth asymmetry principle (hereafter CDAP), we will now use it to account for the demonstration in Figure 86.3. Recall that when the diamonds carry near disparity, they float freely in front of the background, and the illusory triangle tends to be seen as figure. When the disparity is reversed, however, the diamonds drag the background back with them, and the triangle appears as a hole. This asymmetry in surface layout is depicted in Figure 86.3B.
Let us first consider the case in which the diamonds appear to float in front. The visual system has to interpret the disparity signals carried by the edges of the diamonds. The CDAP requires both sides of the diamonds' edges (i.e., the black inside and the white outside of the diamonds) to be at least as distant as the edges. Now consider the “pacman” inducers, which are more distant than the diamonds. The constraint requires both sides of these edges to be at least as distant as their edges. This means that all of the black interior of the inducers must be at least this distant and, more importantly, all of the white background must be at least this distant, which is farther than the disparity of the diamonds. If all of the white background is farther than the diamonds, then the edges of the diamonds must be occluding edges, and the black interior of the diamonds must be an occluding surface. This explains why the diamonds are seen as independent occluders, floating in front of the large white background and black inducers: the edges of the “pacman” inducers drag the white background back, leaving the diamonds floating in front.
Now consider the case in which the diamonds are more distant than the inducers. Again, the CDAP requires both the inside and the outside of the diamonds to be at least as far back as their disparity dictates. This means that both the diamonds and their white background are dragged back to the more distant disparity. Now consider the “pacman” inducers, which carry a relatively near disparity. Because the white background behind the diamonds has been dragged back with the diamonds, the inducers and their white background must be occluding surfaces. This means that the background immediately surrounding the diamonds must be visible through a hole in the occluding surface. The edges of this hole are the illusory contours of the Kanizsa figure. Note again: the fact that both sides of every edge have to be at least as far as the edge leads to asymmetrical surface structures when disparities are inverted.
This is just one example that shows how the CDAP can account for asymmetrical effects of relatively near and relatively far disparities on perceived surface layout. Because the CDAP is derived from the geometry of occlusion, it can account for a very large number of displays and can be used to generate surprising new displays (Anderson, 1999, Anderson, in press).
| |