| |
Local motion
Images are formed as projections of the three-dimensional world onto a two-dimensional light-sensing surface. This surface could be, for example, a piece of photographic film, an array of light sensors in a television camera, or the photoreceptors in the back of a human eye. At each point on the surface, the image brightness is a measurement of how much light fell on the surface at that spatial position at a particular time (or over some interval of time). When an object in the world moves relative to this projection surface, the two-dimensional projection of that object moves within the image. The movement of the projected position of each point in the world is referred to as the motion field.
The estimation of the motion field is generally assumed to be the first goal of motion processing in machine vision systems. There is also evidence that this sort of computation is performed by biological systems. The motion field must be estimated from the spatiotemporal pattern of image brightness. This is usually done by assuming that the brightness generated by points in the world remain constant over time. In this case, the estimated motion of these constant-brightness points (known as the optical flow) is also an estimate of the motion field. But as many authors have shown, the optical flow is not always a good estimate of the motion field (e.g., Horn, 1986; Verri and Poggio, 1989). For example, when a shiny object moves, specular highlights often move across the surface of the object. In this situation, the optical flow (corresponding to the highlight motion) does not correspond to the motion of any point on the object. Nevertheless, estimates of optical flow are almost universally used as approximations of the motion field.
In estimating optical flow, we cannot ask about the motion of an isolated point without considering the context surrounding it. That is, we can only recognize the motion of local patterns of brightness. But our ability to estimate a unique velocity at a given image location depends critically on the structure of the image in the neighborhood of that location. Consider first the simplest situation, in which an object is moving horizontally, perpendicular to the line of sight. Figure 109.1 depicts three prototypical situations that can arise. First, the local brightness might be constant. In this case, the local measurements places no constraint on the velocity. We will refer to this as the blank-wall problem.
Figure 109.1..
Conceptual illustration of motion estimation in three different regions of an image of a horizontally translating cube. In a region of constant brightness (top face of the cube), the local velocity is completely unconstrained since the observed image is not changing over time. We refer to this as the blank-wall problem. In a region where the brightness varies only along a unique spatial direction (striped side of the cube), the brightness changes are consistent with a one-dimensional set of velocities: one can determine the motion perpendicular to the stripes but not the motion along the stripes. This is known as the aperture problem. Finally, in a region where the brightness changes in all spatial directions (hatched side of the cube), a unique velocity is consistent with the observed brightness changes.
Second, the local brightness might vary in only one direction—that is, the spatial pattern could be striped. In this case, only the velocity component that is perpendicular to the stripes is constrained. Any component along the stripes will not create a change in the image and thus cannot be estimated. This is typically known in the literature as the aperture problem (Fennema and Thompson, 1979; Marr and Ullman, 1981; Wallach, 1935). The expression refers to the fact that the motion of a moving one-dimensional pattern viewed through a circular aperture is ambiguous. The problem is not due to the aperture but arises from the one-dimensionality of the signal.
Finally, the local brightness may vary two-dimensionally, in which case the optic flow vector is uniquely constrained. But because of the occurrence of underconstrained regions (blank-wall and aperture problems), a full solution for the motion problem in which all image points are assigned a velocity vector seems to require the integration of information across spatial neighborhoods of the image (and perhaps over time as well). This concept has been studied and developed for many years in computer vision (e.g., Hildreth, 1984; Horn and Schunck, 1981; Lucas and Kanade, 1981).
In addition to the blank-wall and aperture problems, in which the velocity is underconstrained, there are often locations in an image at which multiple velocity signals interact. In particular, this can occur at occlusion boundaries of objects, where any local spatial neighborhood must necessarily include some portion of both object and background, which are typically moving differently. Another example occurs in the presence of transparently combined surfaces or specular highlights. In each of these cases, the local motion description requires more than one velocity, along with some sort of assignment of which image content belongs to which velocity.
Solutions for the multiple-motion problem have been developed fairly recently. Specifically, a number of authors have proposed that one should simultaneously decompose the image into consistently moving layers of brightness content and estimate the motions within those layers (Ayer and Sawhney, 1995; Darrell and Pentland, 1995; Wang and Adelson, 1994; Weiss and Adelson, 1996). Some authors have suggested that this sort of solution might be implemented biologically (Darrell and Simoncelli, 1994; Koechlin et al., 1999; Nowlan and Sejnowski, 1995).
We'll return to this point in the “Discussion” section, but for most of the chapter, we'll restrict our attention to the simple case of translational motion over a local patch of the image, ignoring the possibility of multiple motions. As in much of the computational motion literature, we view this as a building block that could be combined with further processing to provide a more complete solution for the analysis of motion. The goal of this chapter is to introduce a framework for thinking about motion and to interpret the components of that framework physiologically.
| |