| |
Several techniques are now available to study the functional anatomy of speech and language processing by measuring neurophysiological activity noninvasively. This entry reviews the four dominant methods, electroencephalography (EEG) and magnetoencephalography (MEG), which measure the extracranial electromagnetic field, and positron emission tomography (PET) and functional magnetic resonance imaging (fMRI), which measure local changes in blood flow associated with active neurons. Each of these techniques has inherent strengths and weaknesses that must be taken into account when designing and interpreting experiments.
EEG and MEG respectively measure the electrical and magnetic field generated by large populations of synchronously active neurons with millisecond temporal resolution (Hamalainen et al., 1993; Nunez, 1995). Asynchronous activity cannot be easily detected because the signals produced by individual cells tend to cancel each other out rather than summing to produce a measurable signal at sensors or electrodes outside the head. The bulk of EEG and MEG signals appear to be generated not by action potentials but by postsynaptic potentials in the dendritic trees of pyramidal cells.
Although EEG has excellent temporal resolution, on the order of milliseconds, it is limited by poor spatial resolution because of the smearing of the potentials by the skull (Nunez, 1981). As a consequence, it is very difficult to identify the source of a signal from the distribution of electric potentials on the scalp. For any given surface distribution, there are many possible source distributions that might have produced the surface pattern—thus, the inverse problem has no unique solution. This complication is particularly significant where there are multiple generators, as is often the case in speech and language studies. The signals from different neural generators are mixed together in the potentials recorded at scalp electrodes.
EEG measures the electrical field produced by synchronous neural activity; MEG measures the magnetic fields associated with these electric current sources. There are important differences, however, between MEG and EEG signals. First, magnetic fields are unaffected by the tissue they pass through, so there is far less distortion of the signal between the source and the sensor in comparison to EEG (Hamalainen et al., 1993). Second, because most MEG is a measure of only the radial component of the magnetic field, MEG is effectively blind to activity that occurs in cortical areas that are oriented roughly parallel to the sensor (i.e., mostly gyral convexities). Conveniently for speech scientists, most of human auditory cortex is buried inside the sylvian fissure, making MEG ideal for recording auditory or speech-evoked fields. MEG has a temporal resolution comparable to that of EEG. Theoretically, MEG has somewhat better spatial resolution than EEG because magnetic fields pass unaffected through the tissues of the head, but this benefit is partly cancelled by the greater distance imposed between MEG sensors and the brain. Source localization in MEG is still limited by the nonuniqueness of the inverse problem, which becomes increasingly troublesome as the number of signal generators increases.
Most EEG and MEG studies in speech and language use an event-related potential (ERP) design. In such a design, the onset of EEG recording is time-locked to the onset of an event—say, the presentation of a stimulus—and the resulting EEG response is recorded. Because the ERP signal is a small component of the overall EEG signal, the event of interest must be repeated several times (up to 100), and the responses averaged. Another increasingly popular use of electromagnetic responses involves mapping regional correlations (synchrony) in oscillatory activity during cognitive and perceptual processes (Singer, 1999), which has been suggested to reflect cross-region binding.
Unlike the electromagnetic recording techniques, hemodynamic techniques such as PET and fMRI measure neural activity only indirectly (Villringer, 2000). The basic phenomenon underlying these methods is that an increase in neural activity leads to an increase in metabolic demand for glucose and oxygen, which in turn appears to be fed by a localized increase in cerebral blood flow (CBF) to the active region. It is these hemodynamic reflections of the underlying neural activity that PET and fMRI measure, although in different ways.
PET measures regional CBF (rCBF) in a fairly straightforward manner (Cherry and Phelps, 1996): water (typically) is labeled with a radioactive tracer, oxygen 15; the radiolabeled material is introduced into the bloodstream, typically intravenously; metabolically active regions in the brain have an increased rate of blood delivery, and therefore receive a greater concentration of the radioactive tracer; the regional concentrations of the tracer in the brain can then be measured using a PET scanner, which detects the decay of the radioactive tracer. As the tracer material decays, positrons are emitted from the radioactive nucleus and collide with electrons. Such a collision results in annihilation of the positron and electron and the generation of two gamma rays that travel away from the site of the collision in opposite directions and exit the head. The PET scanner, which is composed of a ring of gamma ray sensors, detects the simultaneous arrival of two gamma rays on opposite sides of the sensor array, and from this information the location of the collision site can be determined. PET also measures other aspects of local energy metabolism using different labeled compounds, based on the same principle, namely, that the amount of the agent taken up is proportional to the local metabolic rate. Oxygen metabolism is measured with oxygen labeled with oxygen 15, and glucose metabolism is measured with a molecule similar to glucose called deoxyglucose labeled with fluorine 18. The spatial resolution of PET is ultimately limited by the average distance a positron travels before it collides with an electron, which is in the range of a few millimeters. In practice, however, a typical PET study has a spatial resolution of about 1 cm. The temporal resolution of PET is poor, ranging from approximately 1 minute for oxygen-based experiments to 30 minutes for glucose-based studies.
Typical PET experiments contrast rCBF maps generated in two or more experimental conditions. For example, one might contrast the rCBF map produced by listening to speech sounds with that produced in a resting baseline scan with no auditory input. Subtracting the resting baseline map from the speech-sound activation map would yield a different map highlighting just those brain regions that show a relative increase in metabolic activity during speech perception. Many studies attempt to isolate subcomponents of a complex process by using a variety of clever control conditions rather than a resting baseline. Whereas this general approach has yielded important insights, it must be used cautiously because it makes several assumptions that may not hold true. One of these, the “pure insertion” assumption, is that cognitive operations are built largely of noninteracting stages, such that manipulating one stage will not affect processes occurring at another stage. This assumption has been seriously questioned, however (Sartori and Umilta, 2000). Another assumption of the subtraction method is that the component processes of interest have neural correlates that are to some extent modularly organized, and further that the modules are sufficiently spatially distinct to be detected using current methods. In some cases this assumption may be valid, but in others it may not be, so again, caution is warranted in interpreting results of subtraction-based designs. These issues arise in fMRI designs as well.
Experimental designs that do not rely on subtraction logic are becoming increasingly popular. Correlational studies, for example, typically scan the participants under several parametrically varied levels of a variable and look for rCBF patterns across scans that correlate with the manipulated variable. For example, one might look for brain regions that show systematic increases in rCBF as a function of increasing memory load or of increasing rate of stimulus presentation. Alternatively, it is possible in a between-subject design to look for correlations between rCBF and performance on a behavioral measure.
In order to increase signal-to-noise ratios in PET studies, data from several participants are averaged. To account for individual differences in brain anatomy, each participant's PET scans are normalized to a standard stereotaxic space and spatially smoothed prior to averaging (Evans, Collins, and Holmes, 1996). Group averaged CBF maps are then overlaid onto normalized anatomical MR images for spatial localization. Group averaging does improve the signal-to-noise ratio, but it also has drawbacks. First, there is some loss of spatial resolution. This is important, not just in terms of localizing the precise site of an activation, but also in terms of the ability to detect activations in the first place: spatially smaller activations are less likely to be detected than larger ones, even if they are equally robust, simply because there is a reduced likelihood of small activations overlapping precisely in spatial location across subjects. A related drawback is that it is often hard to distinguish between a difference in activation level and a difference in spatial distribution.
fMRI also is sensitive to hemodynamic changes, but not in the same way as PET. fMRI is based on a rather surprising physiological fact: when a region of brain is activated, both CBF and the metabolic rate of oxygen increase, but the CBF increase is much larger. This means that the local venous blood is more oxygenated during activation, even though the metabolic rate of oxygen has increased. The physiological significance of this is still not understood, but one possibility is that the increased level of tissue oxygen is necessary to drive a higher oxygen flux into the mitochondria. The most commonly used fMRI technique is sensitive to these changes in the oxygen concentration of blood; this is the BOLD, or blood oxygenation level dependent, signal (Chen and Ogawa, 2000). The BOLD signal is intrinsic to the blood response, and so, unlike in PET, no radioactive tracers are needed. A typical fMRI experiment involves imaging the brain repeatedly, collecting a volume of images every few seconds for a period of several minutes, during which time the participant is presented with alternating blocks of two (or more) stimulus or task conditions. Brain areas that are differentially active in one condition versus another will show a modulation of the MR image intensity over time that correlates with the stimulus (task) cycles. Under ideal conditions the spatial resolution of fMRI comes close to the size of a cortical column, although in most applications the resolution is closer to 3–6 mm (Kim et al., 2000). The temporal resolution of fMRI is limited by the variability of the hemodynamic response. Under ideal conditions, fMRI appears to be capable of resolving stimulus onset asynchronies in the range of a few hundreds of milliseconds, and there is some indication that even better temporal resolution (tens of milliseconds) is possible (Bandettini, 2000). However, in most applications the temporal resolution ranges from about 1 s to tens of seconds, depending on the design.
Because fMRI measures intrinsic signals, it is possible to present the stimulus (task) conditions in short alternating blocks, or even as a series of individual events, within a single scan, unlike a PET study (Aguirre, 2000). Typical block design experiments present four to eight cycles of alternating blocks of different stimulus (task) conditions per scan. Event-related fMRI designs, which are modeled on ERP experiments, present stimuli individually rather than in blocks. This affords greater flexibility in experimental design: items consisting of different conditions can be randomly intermixed to decrease predictability of the upcoming items, and the blood responses to items can be sorted and averaged in a variety of ways, for example by accuracy or reaction time of simultaneously collected behavioral responses. The disadvantages of event-related designs include decreased amplitude of the response due to shorter stimulus durations, and increased sensitivity to regional differences in response onset, which can provide better information in the temporal domain but also can make it difficult to model the hemodynamic response equally well across all activated brain regions.
fMRI is more sensitive than PET, allowing the detection of reliable signals in individual subjects. Despite this distinct advantage, most fMRI analyses are modeled on PET procedures, with spatial normalization of individual data sets, group averaging, and overlaying of activation maps onto normalized anatomical images. Also, as in PET experiments, most fMRI experiments utilize subtraction-based designs or correlational methods. Although fMRI has several advantages over PET, it also has several drawbacks related primarily to artifacts introduced into the signal from head motion, physiological noise (respiration, cardiac pulsations), and inhomogeneities in the magnetic field coming from a variety of sources. Another drawback, particularly relevant for speech/language studies, is the greater than 100 dB noise generated by the magnet during image acquisition. A potentially promising solution to this latter problem involves presenting auditory stimuli during silent periods between image acquisition (Hall et al., 1999).
| |