MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

The CogNet Library : References Collection
mitecs_logo  The Visual Neurosciences : Table of Contents: Neural Mechanisms of Natural Scene Perception : Section 1
Next »»
 

Scene statistics

Natural images are obviously quite complex and may appear almost impossible to characterize. Fortunately, several reasonable assumptions simplify analysis of natural scenes. A natural scene can be broadly defined as any scene that is likely to be experienced in the normal evolution or development of an organism. This excludes many of the simple stimulus patterns, such as gratings and white noise, often used to study vision but rarely encountered in nature. It is also useful to ignore the structure of any single, specific image and focus instead on characterizing the statistical distribution of a large ensemble of scenes. Any specific regularities in this distribution could, in principle, be exploited by the visual system. Another simplifying assumption is viewpoint invariance, which reflects the fact that the camera (or eye) could observe a scene from any vantage point. Finally, it is useful to consider the spatial and temporal statistics of natural scenes in isolation insofar as this is possible. The spatial statistics of the scenes are determined by the structure of the world, while the temporal statistics are determined by several factors: the motion of objects in the world, the motion of the observer through the world, and the observer's eye movements.

The first-order spatial and temporal statistics of natural images are relatively well understood (Olshausen and Field, 2000; Ruderman, 1994). They describe the distribution of luminance values that might appear in any image. The second-order spatial statistics are more interesting; they describe the correlations between pairs of pixels. These correlations can be obtained from the spatial autocorrelation or from its Fourier transform, the power spectrum. Analysis of the autocorrelation matrix of the distribution of natural scenes reveals that nearby pixels tend to have similar brightness values and that the strength of this correlation falls off with distance. Correspondingly, analysis of the power spectrum of this distribution shows that power decreases as 1/f2 (where f is spatial frequency; note that the power spectrum is the square of the amplitude spectrum). In other words, low spatial frequencies in natural scenes tend to have the most power, and power decreases with increasing spatial frequency (Field, 1987) (Fig. 107.1). These second-order statistics may reflect both scale invariance (Field, 1987; Ruderman, 1997) and the distribution of one- and two-dimensional image features such as edges, curves, line terminations, and so on (Balboa et al., 2001).

Figure 107.1..  

A, White noise. This stimulus has a flat power spectrum and no correlations between pixels. B, 1/f noise. This stimulus has a 1/f2 power spectrum like that of natural images but no correlations between pixels. C, A natural image. D, The natural image shown in C, with a normal power spectrum and a scrambled phase spectrum. The image appears similar to the 1/f noise pattern shown in B. E, The natural image shown in C, but with a whitened power spectrum and a normal phase spectrum. The image is recognizable.


Natural scenes contain rich statistical structure beyond the two-point spatial correlations described by the power spectrum. Edges and lines, textures, surface patches, and objects all can contain correlations higher than second-order. A simple demonstration illustrates the importance of higher-order structure for natural scene perception: if the power spectrum of a white noise pattern is adjusted to match the 1/f2 spectrum of natural scenes, the resulting image will look like shapeless clouds (Fig. 107.1). This indicates that the appearance of a natural scene is not entirely (or even predominantly) determined by its power spectrum. Conversely, if we weight the Fourier coefficients of a natural scene so that all frequencies have equal power (a process called whitening), the resulting image still contains substantial identifiable structure. Thus, the appearance of the scene is not determined solely by its power spectrum. Finally, it is easy to discriminate between scenes whose power spectra have been shuffled (i.e., the amplitudes of the spatial frequencies have been shuffled randomly). In contrast, it is impossible to discriminate between images whose phase spectra are randomized. This implies that much of the information about the meaning of specific natural scenes is carried in the spatial phase spectrum (Piotrowski and Campbell, 1982; Tadmor and Tolhurst, 1993; Thomson and Foster, 1997).

It is difficult to quantify the higher-order spatial statistics of natural scenes (Krieger et al., 1997; Thomson, 1999; Wegmann and Zetzsche, 1990b) and their phase characteristics (Thomson, 2001). This is partly due to the common occurrence of surface and volume occlusions in natural scenes. Occlusion in natural images is a fundamentally nonlinear process, and few methods are available for quantifying the resulting distributions (Lee et al., 2001). Even so, useful image models can be obtained by using established statistical methods that do not explicitly consider occlusion.

One of the most useful statistical methods for discovering important structure in natural scenes has been independent components analysis (ICA) (Hyvarinen et al., 2001). Assume that each natural image reflects the combined influence of many independent factors that represent the structural elements of the image. Instead of identifying these elements according to some explicit theory, ICA extracts them directly from a large sample of images. ICA finds the independent factors that can be linearly combined to create any image from a distribution of natural scenes.

The theoretical importance of independent components (ICs) for image coding can be understood by considering the probability density function (PDF) of pixel correlations in natural images. If the PDF is Gaussian, then there are no privileged directions in correlation space and all linear coding strategies are equally efficient. However, if the distribution is not Gaussian, then image codes that are aligned with the most kurtotic dimensions will be most efficient. ICA allows one to discover these important non-Gaussian dimensions, even if they reflect image correlations beyond second-order.

Several different studies have now used ICA to characterize natural scenes (Bell and Sejnowski, 1997; Hyvarinen and Hoyer, 1999; Olshausen and Field, 1997) and have produced remarkably consistent results. In all cases the ICs are spatially localized, bandpass functions tuned for orientation and spatial frequency. Taken together these ICs can represent any natural scene, but very few of them appear in any specific scene. This form of representation is known as a sparse code.

The ICs of natural scenes are very similar to Gabor wavelets (each wavelet is a sinusoid weighted by a Gaussian envelope) that have been used as computational models of neurons in primary visual cortex (Daugman, 1988; Field, 1994; Navarro et al., 1996). When ICA is applied to videos of natural scenes containing moving objects (Szatmary and Lorincz, 2002; van Hateren and Ruderman, 1998), the ICs are localized in space-time, bandpass in spatial and temporal frequency, tuned for orientation, and direction selective. The close correspondence between the ICs of natural scenes (or movies) and the tuning properties of V1 neurons suggests that this area is optimized to process the sparsely distributed features in natural scenes. We discuss this issue more fully below.

Independence, as used in ICA, has a specific meaning: the presence of one IC predicts nothing about the presence or absence of any other element. However, there is no guarantee that the estimated ICs will indeed be completely independent. The ICA algorithm only ensures that they will be as independent as possible, given the data. The actual degree of independence is important; if the ICs of natural images are not entirely independent, further processing could recover additional information. The nonindependence of Gabor wavelet filters has been examined directly in theoretical studies by examining their responses to natural images (Simoncelli and Schwartz, 1999; Wegmann and Zetzsche, 1990a; Zetzsche et al., 1999). The filters are not completely independent; when one wavelet filter is triggered by an image, another filter is also likely to respond. These correlations reflect higher-order structure in natural images that might be exploited by further nonlinear processing.

 
Next »»


© 2010 The MIT Press
MIT Logo