MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

The CogNet Library : References Collection
mitecs_logo  The Visual Neurosciences : Table of Contents: Spatial Channels in Vision and Spatial Pooling : Section 1
Next »»
 

Evidence for spatial channels

Following the auditory analogy, Campbell and Robson (1968) measured detection thresholds for cosine gratings as a function of spatial frequency. At low temporal frequencies, the data (plotted as sensitivities which are reciprocals of threshold contrasts) describe a bandpass function, the CSF, with a peak at about 3 to 5 c/deg (Fig. 69.1). A convenient mathematical description of the CSF as a function of spatial frequency ω is given by Wilson and Giese (1977):

Figure 69.1..  

Typical CSF for sine wave gratings presented at low temporal frequency. The solid curve is from equation 1. The dashed notch depicts the effect of adaptation to the spatial frequency indicated.


CSF( ω ) =M ω α exp( -ω / f ) (1)

where the peak frequency can easily be shown to be ω = αf. For low temporal frequency presentations, α ∼ 1, M = 150, and f = 5 gives a good approximation to typical CSF data, and this function is plotted in Figure 69.1. Data obtained under transient or high temporal frequency conditions produce values of α ∼ 0.4.

To test the idea that the CSF might represent the envelope of many more narrowly tuned channels, Campbell and Robson (1968) conducted a summation experiment. Their logic was simple: measure the threshold for a complex spatial pattern composed of many widely separated spatial frequencies. If that threshold was simply a weighted sum of the thresholds for the individual components, the CSF must describe the properties of a single spatial channel that summed all spatial frequencies proportionately. If, however, the complex pattern reached threshold only when one of its spatial frequency components reached threshold independently, then the CSF must be the envelope of many spatial channels tuned to narrow ranges of spatial frequencies. The data, obtained using square wave gratings, were unambiguous: the visual system did not add up all spatial frequencies equally but rather behaved like a bank of independent spatial channels, each sensitive to a different range of spatial frequencies.

In the wake of this discovery, which supported the analogy to audition, the race was on to characterize the spatial frequency bandwidths of individual channels. The earliest approach employed spatial frequency adaptation, and results were published almost simultaneously by Pantle and Sekuler (1968) and Blakemore and Campbell (1969). In this technique, subjects viewed a high-contrast grating of fixed spatial frequency ω for several minutes, moving their eyes across the bars to minimize conventional afterimage formation. Measurement of the CSF after this adaptation period revealed a notch of depressed sensitivities centered on ω, as illustrated in Figure 69.1. Bandwidth estimates based on the adaptation technique fell in the 1 to 2 octave range. (One octave is a factor of 2; 2 octaves are a factor of 4; so n octaves are a factor of 2n.) Thus, adaptation studies suggested that spatial frequencies differing by a factor of 2 to 4 would be processed by independent spatial channels. Strong support for this range was provided by the classic study of Graham and Nachmias (1971), who showed convincingly that spatial frequencies differing by a factor of 3 (i.e., 1.6 octaves) were processed independently at threshold.

The subsequent 14 years saw many attempts to measure spatial channel bandwidths more precisely using a variety of techniques. Several approaches using a technique known as subthreshold summation produced bandwidth estimates as narrow as 0.33 octave (Kulikowski and King-Smith, 1973; Sachs et al., 1971). Subsequent work, however, showed that these figures were artifactually narrowed as a consequence of the spatial beat pattern that occurs when cosine gratings of very similar spatial frequencies are added together (see Wilson, 1991, and Wilson and Wilkinson, 1997, for further discussion).

An oblique masking technique that was not subject to these problems was developed in our laboratory (Wilson et al., 1983). High-contrast masking gratings oriented at an angle of 15 degrees from vertical were superimposed on vertical test patterns with a 1 octave bandwidth (sixth spatial derivatives of Gaussians, D6s). In masking experiments the threshold elevation is defined as the ratio of the test pattern threshold in the presence of the masking grating to the test threshold measured with no mask present, so a threshold elevation of 1.0 would indicate that the mask had no effect on the test. In our experiments, the spatial frequencies of both test D6s and masking gratings were varied in half-octave steps from 0.25 up to 22.6 c/deg so that every test D6 was paired with a wide range of mask spatial frequencies. The resulting 14 threshold elevation curves, each obtained with a different D6 test spatial frequency, were fit quantitatively with a set of just six underlying visual channels tuned to peak frequencies of 0.8, 2.0, 2.8, 4.0, 8.0, and 16.0 c/deg (Wilson et al., 1983). The spatial frequency tuning curves for four of these visual channels are plotted in Figure 69.2. Note that the envelope of these curves describes the CSF fairly well. Although the notion of just six visual channels in the fovea remains controversial, it is nevertheless true that these six suffice to encode all of the spatial frequency information present in the stimulus; more than six channels would be redundant. Furthermore, shifting peak channel sensitivities to lower spatial frequencies in the visual periphery produces an effective continuum of channel tunings across the visual system (Swanson and Wilson, 1985).

Figure 69.2..  

CSF with four underlying spatial frequency selective visual channels. The channels are most sensitive at 2, 4, 8, and 16 c/deg, and their envelope provides a good fit to the CSF above about 0.75 c/deg. (Data from Wilson et al., 1983.)


In a subsequent experiment, Phillips and Wilson (1984) used masking to measure orientation bandwidths, and these were found to vary from ±30 degrees at half amplitude for the lowest spatial frequencies down to ±15 degrees for the highest spatial frequency mechanisms. These estimates of both spatial frequency and orientation bandwidths obtained by masking were in good quantitative agreement with bandwidths of single neurons in macaque area V1 as measured by De Valois et al. (1982). A graph comparing both spatial frequency and orientation bandwidths in macaques and humans can be found elsewhere (Wilson, 1991). The two-dimensional spatial receptive fields, RF(x,y), of these visual channels could be well described using combinations of Gaussian functions

RF( x,y ) =A{ exp( - x 2 / σ 1 2 ) -Bexp( - x 2 / σ 2 2 ) +        Cexp( - x 2 / σ 3 2 ) }exp( - y 2 / σ y 2 ) (2)

and a table of the various constants can be found in Wilson (1991). Furthermore, the fact that both spatial frequency and orientation bandwidths decrease with increasing peak frequency indicates that spatial processing by the visual system cannot be accurately described by a wavelet transform. Indeed, visual channels are performing neither a wavelet nor a Fourier transform (Wilson and Wilkinson, 1997). Rather, the data corroborate the early hypothesis of Thomas (1970) that visual channels reflect properties of cortical receptive fields of varying size and preferred orientation.

In order to infer the tuning curves in Figure 69.2, it was necessary to determine how masking varied with mask contrast. In a pioneering study, Nachmias and Sansbury (1974) had shown that threshold elevations due to masking were described by the “dipper-shaped” function of mask contrast depicted in Figure 69.3. As mask contrast increased from zero, test pattern thresholds initially decreased by a factor of about 2. At higher mask contrasts, however, test pattern thresholds rose substantially, resulting in large threshold elevations. Nachmias and Sansbury suggested that the masking dipper function reflected properties of a contrast nonlinearity in visual channels that was accelerating at subthreshold contrasts but compressive at suprathreshold contrasts. A suitable form for this function was found to be

Figure 69.3..  

Typical contrast increment threshold functions. The solid curve illustrates grating contrast increment thresholds as a function of mask grating contrast when mask and test gratings are at the same orientation, 0 degrees. Note the characteristic dip at the location of the arrow, which was first reported by Nachmias and Sansbury (1974). The dashed line shows the effect of a masking grating at an angle of 22.5 degrees relative to the test grating. As reported by Foley (1994), the dip disappears in this case.


F( C ) = M C N+ɛ α N + C N (3)

where M and α are constants. Typically, 2 ≤ N ≤ 4, and ε ∼ 0.5 (Wilson, 1980; Wilson et al., 1983). Note that if ε = 0, this is just a Naka-Rushton (1966) function of the type that has been successfully used to describe the responses of V1 cortical neurons (Albrecht and Hamilton, 1982; Sclar et al., 1990; see Chapter 47). This contrast nonlinearity generates the dipper function in Figure 69.3 on the assumption that contrast increments at threshold, Δ, are defined by the relation

F( C+Δ ) -F( C ) =1 (4)

Masking data were corrected for this contrast nonlinearity to obtain the tuning curves plotted in Figure 69.2 (Wilson et al., 1983).

This completes a brief historic review of visual channels as they were understood around 1990 (Graham, 1989; Wilson, 1991). To predict the response of an array of psychophysically defined cortical neurons, the receptive field description in equation 2 was first convolved with the luminance profile of the stimulus. Following this, the nonlinearity in equation 3 was applied pointwise to the convolution result to produce an estimate of neural responses to the pattern in question (see Wilson, 1991, for details). Thus, visual channels were assumed to process the retinal image entirely independently and in parallel. As we shall see, this formulation was radically altered by research during the subsequent decade.

 
Next »»


© 2010 The MIT Press
MIT Logo