MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

The CogNet Library : References Collection
mitecs_logo  The Visual Neurosciences : Table of Contents: Eye Movements in Daily Life : Section 1
Next »»
 

Part 1: The relations of eye movements to action in different activities

Activities involving the uptake of visual information can be divided into those that are basically sedentary—reading or typing, for example—and those such as carpentry or ball sports that involve more extensive movement. The former have been studied since eye movement recordings began (e.g., Dodge, 1900), but the latter have been open to investigation only since about 1960, when portable recording devices started to become available.

Sedentary Activities: Reading, Musical Sight-reading, and Typing

Although silent reading involves no overt action, it nevertheless requires a particular eye movement strategy to enable the uptake of information in a way that allows meaning to be acquired. It is also one of best-studied (as well as the most atypical) examples of a clearly defined eye movement pattern. Erdman and Dodge (Dodge, 1900) first showed that the subjectively smooth passage of the eye across the page is in reality a series of saccades and fixations in which information is taken in during the fixations. Eye movements in reading are highly constrained to a linear progression of fixations to the right (in English) across the page, which allows the words to be read in an interpretable order. In this, reading differs from many other activities (such as viewing pictures), where order is much less important (Buswell, 1935). Reading is a learned skill, but the eye movements that go with it are not taught. Nevertheless, they are remarkably similar among normal readers. Eye movements during reading have been reviewed thoroughly recently, and only the principal facts need to be included here. Most of what follows is derived from an extensive review by Rayner (1998).

During normal reading, gaze (foveal direction) moves across the line of print in a series of saccades, whose size is typically seven to nine letters. Within limits this number is not affected by the print size, implying that the oculomotor system is able to make scaling adjustments to its performance. For normal print the saccade size is 1 to 2 degrees, and the durations of the fixations between saccades have a mean of 225 msec. In reading aloud, fixations are longer (mean, 275 msec). Most saccades (in English) are to the right, but 10% to 15% are regressions (right to left) and are associated in a poorly understood way with problems in processing the currently or previously fixated word. Words can be identified up to 7 to 8 letter spaces to the right of the fixation point, but some information is available up to 14 to 15 letter spaces; this is used in the positioning of subsequent saccade end points. From studies in which words were masked during fixations, it appears that the visual information needed for reading is taken in during the first 50 to 70 msec of each fixation. Adult readers typically read at a rate of about 300 words per minute or 0.2 second per word.

If changes are made to the text during the course of a fixation, both the duration of the current fixation and the size of the following saccade can be affected. This implies that the text is processed “on-line” on a fixation-by-fixation basis. Similarly, difficult words result in longer fixations, indicating that cognitive processes operate within single fixations. How long it takes to process words all the way from vision to meaning is hard to assess, but in reading aloud, the time between fixating a word and speaking it (the eye-voice span) is about 1 second (Buswell, 1920).

Musical sight reading shares with text reading the constraint that gaze must move progressively to the right. It is, however, more complicated in that—for keyboard players—there are two staves from which notes must be acquired (Fig. 91.1A). Weaver (1943) recorded eye movements of trained pianists and found that they alternated fixation between the upper and lower staves, acquiring notes from the score at a rate of approximately 1.5 notes per fixation (making a note roughly equivalent to a word in text reading). This alternation means that notes that have to be played together are viewed at different times, adding the task of temporal assembly to the other cognitive tasks of interpreting the pitch and length of the notes. The time from reading a note to playing it (the eye-hand span) is similar to reading aloud: about 1 second. Furneaux and Land (1999) looked at the eye-hand span in pianists of differing abilities. They found that it did not vary with skill level when measured as a time interval, but that when measured in terms of the number of notes contained in that interval, professionals averaged four compared with two for novices. Thus, the processing time is the same for everyone, but the throughput rate of the processor is skill dependent. The processing time did alter with tempo, however, with fast pieces having an eye-hand span of 0.7 second, increasing to 1.3 seconds for slow pieces.

Figure 91.1..  

a, Fixations made by an expert accompanist while sight reading a passage from a sonata by Domenico Scarlatti at full speed. The shaded circles are fixations followed by glances down to the keyboard. Note that these have no detectable effect on the fixation sequence. b, Fixations during copy typing. The upper line of each pair shows the text being read, with fixations numbered in order. The lower line shows the text as typed, and the diagonal lines show the letters being typed at the time of the indicated fixations on the upper row. The eye-hand span for this fast typist is about five letter spaces. (a from Land and Furneaux, 1997; b from Butsch, 1932.)


Copy typing, like music playing, has a motor output, and according to Butsch (1932), typists of all skill levels attempt to keep the eyes 1 second ahead of the currently typed letter, which is much the same as in music reading. This represents about five characters (Fig. 91.1B). More recently, Inhoff and colleagues (Inhoff and Wang, 1992) found more variability in the eye-hand span and also showed that it was affected by the nature of the text. Using a moving window technique, they showed that typing starts to become slower when there are fewer than three letter spaces to the right of fixation, indicating a perceptual span about half the size of that used in normal reading. The potential word buffer is much bigger than this, however. Fleischer (1986) found that when typists use a read/check cycle of approximately 1 second each while typing continuously, they typically take in 11 characters during the read part of the cycle, and, exceptionally, strings of up to 30 characters can be stored.

These activities are all similar in that they involve the continuous processing of a stream of visual information taken in as a series of stationary fixations. This information is translated and converted to a stream of muscular activity of various kinds (or into meaning in the case of silent reading). In each case, the time within the processor is about 1 second. Once the appropriate action has been performed the original visual information is lost, so the process is more like a production line than a conventional memory system.

Activities Involving Movement

The study of the gaze movements of head-free, fully mobile subjects required the development of eye trackers that were head- rather than bench-mounted. The first truly mobile apparatus for freely moving subjects was devised by Mackworth and Thomas (1962). It consisted of a helmet-mounted 8 mm movie camera which viewed the scene ahead. Superimposed on this was a spot derived from the corneal reflex of one eye and conveyed to the camera by an inverted periscope arrangement. This device was successfully used to study eye movements in driving and flying, as well as during more stationary activities (Thomas, 1968). A decade later, small video cameras started to become available, making the recording process much simpler, and a number of video-based eye-movement cameras are currently available commercially. They typically consist of two cameras, one viewing the scene ahead and one viewing the eye. Commonly, the eye is illuminated with infrared light, which provides the eye camera with a “bright pupil” from infrared light reflected from the retina. This is tracked with appropriate software to provide the coordinates of eye direction, which can then be used to position a spot or a cross on the scene video, indicating the instantaneous direction of regard of the fovea. A variant of this design uses the outline of the iris to derive eye direction (Land, 1993). Such devices can be used for virtually any activity. Search-coil methods have also been used successfully to record gaze direction or eye and head direction separately (Collewijn, 1977). Their main limitation is that they can only be used within the magnetic field of the surrounding coils, which means that their versatility is not much greater than that of fixed-head methods.

Eye Movements and Actions in Domestic Tasks

Activities such as food preparation, carpentry, or gardening typically involve a series of different actions, rather loosely strung together by a “script.” They provide examples of the use of tools and utensils, and it is of obvious interest to find out how the eyes assist in the performance of these tasks.

Land et al. (1999) studied the eye movements of subjects while they made cups of tea. When tea is made with a teapot, this simple task involves about 45 separate acts (defined as “simple actions that transform the state or place of an entity through manual manipulation”; Schwartz et al., 1991). Figure 91.2 shows the 26 fixations made during the first 10 seconds of the task. The subject first examines the kettle (11 fixations), picks it up, and looks toward the sink (3 fixations), walks to the sink while removing the lid from the kettle (insert: 4 fixations), places the kettle in the sink and turns on the tap (3 fixations), then watches the water as it fills the kettle (4 fixations). There is only one fixation that is not directly relevant to the task (to the sink tidy on the right). Two other subjects showed remarkably similar numbers of fixations when performing the same sequence. The principal conclusions from this sequence are:

  • 1. Saccades are made almost exclusively to objects involved in the task, even though there are plenty of other objects around to grab the eye.

  • 2. The eyes deal with one object at a time. This corresponds roughly to the duration of the manipulation of that object and may involve a number of fixations on different parts of the object.

  • Figure 91.2..  

    a, Fixations and saccades made during the first 10 seconds of the task of making a cup of tea (lifting the kettle and starting to fill it). Note that all but one of the fixations are on objects immediately relevant to the task. b, Plot of eye, head, and gaze movements toward the end of the task in a, showing the steady gaze fixations resulting from eye saccades and compensation for head movement by the vestibulo-ocular reflex. (From Land et al., 1999.)


    There is usually a clear “defining moment” when the eyes leave one object and move on to the next, typically with a combined head and eye saccade. These saccades can be used to “chunk” the task as a whole into separate object-related actions, and they can act as time markers to relate the eye movements to movements of the body and manipulations by the hands. In this way, the different acts in the task can be pooled to get an idea of the sequence of events in a typical act. The results of this are shown in Figure 91.3. Perhaps surprisingly, it is the body as a whole that makes the first movement in an object-related action. Often the next object in the sequence is on a different work surface, and this may necessitate a turn or a few steps before it can be viewed and manipulated. About 0.5 second later, the first saccade is made to the object, and 0.5 second later still, the first indications of manipulation occur. The eyes thus lead the hands. Interestingly, at the end of each action, the eyes move on to the next object about 0.5 second before manipulation is complete. Presumably the information that they have supplied remains in a buffer until the motor system requires it.

    Figure 91.3..  

    Timing of the relations of body, eye, and hand movements averaged from a total of 137 object-related actions during tea-making sessions by three different individuals. Movements of the whole body precede the first fixation by a mean of 0.61 second (a), and these precede the first signs of manipulation by 0.56 second (b). As each action draws to a close, fixation moves to the next object an average of 0.61 second before the end of the preceding action (c). (From Land and Hayhoe, 2001.)


    Almost identical results were obtained by Hayhoe (2000) in a study of students making peanut butter and jelly sandwiches. She found the same attachment of gaze to task-related objects and the same absence of saccades to irrelevant objects. As with the tea making, gaze led manipulation, although with a somewhat shorter interval. This difference is probably attributable to the fact that the sandwich making was a sit-down task involving only movements of the arms. Two other differences that may have the same cause are the existence of more short-duration (<120 msec) fixations than in the tea-making study and the presence of more unguided reaching movements (13%) mostly concerned with the setting down of objects. There was a clear distinction in both studies between within-object saccades, which had mean amplitudes of about 8 degrees in both, and between-object saccades, which were much larger, up to 30 degrees in sandwich making on a restricted table top and 90 degrees in tea making in the less restricted kitchen (Land and Hayhoe, 2001).

    Driving

    Driving is a complex skill that involves dealing with the road itself (steering, speed control), other road users (vehicles, cyclists, moving and stationary pedestrians), and attention to road signs and other relevant sources of information. It is thus a multifaceted task, and one would expect a range of eye movement strategies to be employed. I will first consider steering, as this is a prerequisite for all other aspects of driving.

    When steering a car on a winding road, vision has to supply the arms and hands with the information they need to turn the steering wheel the right amount. What is this control signal, and how is it obtained? As pointed out by Donges as early as 1978, there are basically two sorts of signal available to drivers: feedback signals (lateral and angular deviation from the road centerline, differences between the road's curvature and the vehicle's path curvature) and feedforward or anticipatory signals obtained from more distant regions of the road up to 2 seconds ahead in time (corresponding to 90 ft or 27 m at 30 mph). Donges (1978) used a driving simulator to demonstrate that each of these signals was indeed used in steering, although he did not discuss how they might be obtained visually.

    Eye movement studies both on real roads (Land and Lee, 1994) and on a simulator (Land and Horwood, 1995) have confirmed Donges' two-level model of driving and have gone some way toward establishing how the eyes find the appropriate information. Earlier studies, mainly on U.S. roads that had predominantly low curvatures, had found only a weak relationship between gaze direction and steering (e.g., Zwahlen, 1993). However, on a winding road in Scotland, where continuous visual control was essential, a much more precise relationship was seen. Land and Lee (1994) found that drivers spent much of their time looking at the tangent point on the upcoming bend (Fig. 91.4B). This is the moving point on the inside of the bend where the driver's line of sight is tangential to the road edge; it is the point that protrudes into the road, and it is the only point in the flow field that is clearly defined visually. The angular location of this point relative to the vehicle's line of travel (effectively the driver's trunk axis if he or she is belted in) predicts the curvature of the bend: larger angles indicate steeper curvatures. Potentially, this angle is a signal that can provide the feedforward information required by Donges' analysis, and Figure 91.4A does indeed show that curves of gaze direction and steering wheel angle are almost identical. The implication is that this angle, which is equal to the eye-in-head plus the head-in-body angle when the driver is looking at the tangent point, is translated more or less directly into the motor control signal for the arms. Cross-correlating the two curves in Figure 91.4A shows that gaze direction precedes steering wheel angle by about 0.8 second. This provides the driver with a reasonable comfort margin, but the delay is also necessary to prevent steering from taking place before the bend has been reached.

    Figure 91.4..  

    Eye movements and steering. a, Simultaneous record of gaze direction and steering wheel angle during part of a drive on a winding road in Scotland. Note the extreme similarity of the two records. Cross-correlation shows that eye direction precedes steering by about 0.8 second. Brief glances off-road also occur. b, Contour plots showing the distribution of fixations by three drivers during right and left bends on a 1 km stretch of the same winding road as in a. Both are centred within 1 degree of the respective tangent points. Plots are normalized to the central value, which is approximately 0.12 fixations per deg2s−1. About 35% of fixations lie outside the 0.2 contour, and these are widely scattered. c, Time sharing between tasks. The driver's gaze alternates between the tangent point and the cyclist, spending approximately 0.5 second on each. Steering is linked to the road edge and suspended when looking at the cyclist. (From Land, 1998.)


    Simulator studies showed that feedforward information from the distant part of the road was not sufficient on its own to provide good steering (Land and Horwood, 1995). When the near region of the simulated road was removed from view, curvature matching was still accurate, but position-in-lane control was very poor. To maintain good lane position required a view of the road only a few meters ahead, and this region provided much of the feedback information identified in the Donges model. Interestingly, this part of the road was rarely fixated compared with the more distant tangent point region, but it was certainly seen and used; it is typically about 5 degrees obliquely below the tangent point. Mourant and Rockwell (1970) had already concluded that lane position is monitored with peripheral vision. They also argued that learner drivers first use foveal vision for lane keeping, then increasingly move foveal gaze to more distant road regions and learn to use their peripheral vision to stay in the lane. Summala et al. (1996) reached similar conclusions. The principal conclusion from these studies is that neither the far-road feedforward input nor the near-road feedback input is sufficient on its own, but that the combination of the two allows fast, accurate driving (Land, 1998).

    A feature of Figure 91.4A and similar records is that the eyes are not glued to the tangent point, but take time out to look at other things. These excursions are accomplished by gaze saccades and typically last between 0.5 and 1 second. The probability of these off-road glances occurring varies with the stage of the bend that the vehicle has reached, and they are least likely to occur around the time of entry into a new bend. At this point, drivers fixate the tangent point 80% of the time. It seems that special attention is required at this time, presumably to get the initial estimate of the bend's curvature correct. A confirmation of this came from Yilmaz and Nakayama (1995), who used reaction times to a vocal probe to show that attention was diverted to the road just before simulated bends and that sharper curves demanded more attention than shallower ones. The fewer and shallower the bends in the road, the more time can be spent looking off the road, and this probably accounts for the lack of a close relation between gaze direction and steering on studies of driving on freeways and other major roads.

    Sometimes the eye must be used for two different functions at the same time, and as there is only one fovea and off-axis vision is poor, the visual system has to resort to time sharing. A good example of this is shown in Figure 91.4C, where the driver is negotiating a bend and so needs to look at the tangent point while passing a cyclist who needs to be checked on repeatedly. The record shows that the driver alternates gaze between the tangent point and cyclist several times, spending 0.5 second on each. The lower record shows that he or she steers by the road edge, which means that the coupling between eye and hand has to be turned off when the driver views the cyclist (who would otherwise be run over!). Thus, not only does gaze switch between tasks, so does the whole visuomotor control system. Presumably, while looking at the cyclist, the information from the tangent point is kept “on hold” at its previous value.

    This example has shown how the visual system is able to divide its time between different activities. In urban driving this is even more important, as each traffic situation and road sign competes for attention. To my knowledge, there has been no systematic study of where drivers look in traffic, but from our own observations, it is clear that drivers foveate the places from which they need to obtain information: the car in front, the outer edges of obstacles, pedestrians and cyclists, road signs and traffic lights, and so on. In general, speeds of 30 mph or less require only peripheral lane-edge (feedback) information for adequate steering. Thus, the necessity to use distant tangent points is greatly reduced, freeing up the eyes for the multiple demands of dealing with other road users. As with open-road steering, both foveal and peripheral vision are involved. Miura (1987) has shown that as the demands of traffic situations increase, peripheral vision is sacrificed to provide greater attentional resources for information uptake by the fovea.

    Ball Games

    Some ball sports are so fast that there is barely time for the player to use his normal oculomotor machinery. Within less than 0.5 second (in baseball or cricket), the batter has to judge the trajectory of the ball and formulate a properly aimed and timed stroke. The accuracy required is a few centimeters in space and a few milliseconds in time (Regan, 1992). Half a second gives time for one or at most two saccades, and the speeds involved preclude smooth pursuit for much of the ball's flight. How do practitioners of these sports use their eyes to get the information they need?

    Part of the answer is anticipation. Ripoll et al. (1987) found that international table tennis players anticipated the bounce and made a saccade to a point close to the bounce point. Land and Furneaux (1997) confirmed this (with more ordinary players). They found that shortly after the opposing player hit the ball, the receiver made a saccade down to a point a few degrees above the bounce point, anticipating the bounce by about 0.2 second. At other times, the ball was tracked around the table in a normal nonanticipatory way; tracking was almost always by means of saccades rather than smooth pursuit. The reason players anticipate the bounce is that the location and timing of the bounce are crucial in the formulation of the return shot. Until the bounce occurs, the trajectory of the ball as seen by the receiver is ambiguous. Seen monocularly, the same retinal pattern in space and time would arise from a fast ball on a long trajectory or a slow ball on a short one (Fig. 91.5A). (Whether either stereopsis or looming information is fast enough to provide a useful depth signal is still a matter of debate.) This ambiguity is removed the instant the timing and position of the bounce are established. Therefore, the strategy of the player is to get gaze close to the bounce point (this cannot and need not be exact) before the ball does and lie in wait. The saccade that effects this is interesting in that it is not driven by a stimulus, but by the player's estimate of the location of something that has yet to happen.

    Figure 91.5..  

    Anticipating the bounce in ball games. a, To the receiver of a table tennis stroke, there is little difference in the seen trajectory of a slow short ball and a faster long ball (ignoring stereopsis and looming). The position of the bounce is needed to disambiguate the real trajectory. b, Vertical gaze movements of a batsman in cricket facing a medium-pace ball from a bowling machine. He makes a downward saccade to the approximate bounce point about 140 msec after the appearance of the ball, anticipating the bounce by almost 180 msec in this case. He then tracks the ball for another 200 msec. (From Land and McLeod, 2000.)


    In cricket, where—unlike baseball—the ball also bounces before reaching the batsman, Land and McLeod (2000) found much the same thing. With fast balls, the batsmen watched the delivery and then made a saccade down to the bounce point, the eye arriving 0.1 second or more before the ball (Fig. 91.5B). They showed that with a knowledge of the time and place of the bounce, the batsman had the information he needed to judge where and when the ball would reach his bat. Slower balls involved more smooth pursuit. With good batsmen this initial saccade had a latency of only 0.14 second, whereas poor batsmen or nonbatsmen had more typical latencies of 0.2 second or more.

    In baseball the ball does not bounce, and so that source of timing information is not available. Bahill and LaRitz (1984) examined the horizontal head and eye movements to batters facing a simulated fastball. Subjects used smooth pursuit involving both head and eye to track the ball to a point about 9 ft from them, after which the angular motion of the ball became too fast to track (a professional tracked it to 5.5 ft in front; he had exceptional smooth pursuit capabilities). Sometimes batters watched the ball contact the bat by making an anticipatory saccade to the estimated contact point partway through the ball's flight. This may have little immediate value in directing the bat, because the stroke is committed as much as 0.2 second before contact (McLeod, 1987), but it may be useful in learning to predict the ball's location when it reaches the bat, especially as the ball often “breaks” (changes trajectory) shortly before reaching the batter. According to Bahill and LaRitz (1984), “The success of good players is due to faster smooth-pursuit eye movements, a good ability to suppress the vestibulo-ocular reflex, and the occasional use of an anticipatory saccade.”

     
    Next »»


    © 2010 The MIT Press
    MIT Logo