| | A Guided Tour of Brain Theory and Neural Networks
Michael A.
Arbib
IntroductionHow to Use Part II
Part II provides a guided tour of our subject in the form of 22 road maps, each of which provides an overview of a single theme in brain theory and neural networks and offers a précis of Part III articles related to that theme. The road maps are grouped under eight general headings:
Grounding Models of Neurons and Networks
Brain, Behavior, and Cognition
Psychology, Linguistics, and Artificial Intelligence
Biological Neurons and Networks
Dynamics and Learning in Artificial Networks
Sensory Systems
Motor Systems
Applications, Implementations, and Analysis
Part II starts with the meta-map (Section II.1), which is designed to give some sense of the diversity yet interconnectedness of the themes taken up in this Handbook by quickly surveying the 22 different road maps. We then offer eight sections, one for each of the above headings, that comprise the 22 road maps. In the road maps, we depart from the convention used elsewhere in this text whereby titles in capitals and small capitals are used for cross-references to all other articles. In the road maps, we reserve capitals and Small Capitals for articles on the tour, and we use titles in quotation marks to refer to related articles that are not primary to the current road map. We will use boldface type to refer to road maps and other major sections in Part II.
Every article in Part III occurs in at least one road map, and a few articles appear in two or even three road maps. Clearly, certain articles unequivocally have a place in a given road map, but as I considered articles that were less central to a given theme, my decisions on which articles to include became somewhat arbitrary. Thus, I invite you to read each road map to get a good overview of the main themes of each road map, and then continue your exploration by browsing Part III and using the articles listed under Related Reading and the index of the Handbook to add your own personal extensions to each map.
There is no one best path for the study of brain theory and neural networks, and you should use the meta-map simply to get a broad overview that will help you choose a path that is pleasing, or useful, to you.
Grounding Models of Neurons and Networks
Grounding Models of Neurons
Grounding Models of Networks
The articles surveyed in these two road maps can be viewed as continuing the work of Part I, providing the reader with a basic understanding of the models of both biological and artificial neurons and neural networks that are developed in the 285 articles in Part III. The road maps will help each reader decide which of these articles provide necessary background for their own reading of the Handbook.
Brain, Behavior, and Cognition
Neuroethology and Evolution
Mammalian Brain Regions
Cognitive Neuroscience
The road map Neuroethology and Evolution places the following road map, Mammalian Brain Regions, in a dual perspective. First, by reviewing work on modeling neural mechanisms of the behavior of a variety of nonmammalian animals, it helps us understand the wealth of subtle neural computations available in other species, enriching our study of nervous systems that are closer to that of humans. When we focus on ethology (animal behavior), we often study the integration of perception and action, thus providing a useful complement to the many articles that focus on a subsystem in relative isolation. Second, by offering a number of articles on both biological and artificial evolution, we take the first steps in understanding the ways in which different neural architectures may emerge across many generations. Turning to the mammalian brain, we first look at Mammalian Brain Regions. We will also study the role of these brain regions in other road maps as we analyze such functions as vision, memory, and motor control. We shall see that every such function involves the “cooperative computation” of a multiplicity of brain regions. However, Mammalian Brain Regions reviews those articles that focus on a single brain region and give some sense of how we model its contribution to key neural functions. The road map Cognitive Neuroscience then pays special attention to a range of human cognitive functions, including perception, action, memory, and language, with emphasis on the range of data now available from imaging of the active human brain and the challenges these data provide for modeling.
Psychology, Linguistics, and Artificial Intelligence
Psychology
Linguistics and Speech Processing
Artificial Intelligence
Our next three road maps–Psychology, Linguistics and Speech Processing, and Artificial Intelligence—are focused more on the effort to understand human psychology than on the need to understand the details of neurobiology. For example, the articles on Psychology may overlap those on Cognitive Neuroscience, but overall the emphasis shifts to “connectionist” models in which the “neurons” rarely correspond to the actual biological neurons of the human brain (the underlying structure). Rather, the driving idea is that the functioning of the human mind (the functional expression of the brain’s activity) is best explored through a parallel, adaptive processing methodology in which large populations of elements are simultaneously active, pass messages back and forth between each other, and can change the strength of their connections as they do so. This is in contrast to the serial computing methodology, which is based on the computing paradigm that was dominant from the 1940s through the 1970s and that now is complemented in mainstream computer science by work in grid-based computing, embedded systems, and teams of intelligent agents.
In short, connectionist approaches to psychology and linguistics use “neurons” that are more like the artificial neurons used to build new applications for parallel processing than they are like the real neurons of the living brain.
In dividing this introduction to connectionism into three themes, I have first distinguished those aspects of connectionist psychology that relate to perception, memory, emotion, and other aspects of cognition in general from those specifically involved in connectionist linguistics before turning to artificial intelligence. The road map Psychology also contains articles that address philosophical issues in brain theory and connectionism, including the notion of consciousness, as well as articles that approach psychology from a developmental perspective. The road map Linguistics and Speech Processing presents connectionist models of human language performance as well as approaches (some more neural than others) to technologies for speech processing. The central idea in connectionist linguistics is that rich linguistic representations can emerge from the interaction of a relatively simple learning device and a structured linguistic environment, rather than requiring the details of grammar to be innate, captured in a genetically determined universal grammar. The road map Artificial Intelligence presents articles whose themes are similar to those in Psychology in what they explain, but are part of artificial intelligence (AI) because the attempt is to get a machine to exhibit some intelligent-like behavior, without necessarily meeting the constraints imposed by experimental psychology or psycholinguistics. “Classical” symbolic AI is contrasted with a number of methods in addition to the primary concentration on neural network approaches. The point is that, whereas brain theory seeks to know “how the brain does it,” AI must weigh the value of artificial neural networks (ANNs) as a powerful technology for parallel, adaptive computation against that of other technologies on the basis of efficacy in solving practical problems on available hardware. The reader will, of course, find a number of models that are of equal interest to psychologists and to AI researchers.
The articles gathered in these three road maps will not exhaust the scope of their subject matter, for at least two reasons. First, in addition to connectionist models of psychological phenomena, there are many biological models that embody genuine progress in relating the phenomena to known parts of the brain, perhaps even grounding a phenomenon in the behavior of identifiable classes of biological neurons. Second, while Artificial Intelligence will focus on broad thematic issues, a number of these also appear in applying neural networks in computer vision, speech recognition, and elsewhere using techniques elaborated in articles of the road map Learning in Artificial Networks.
Biological Neurons and Networks
Biological Neurons and Synapses
Neural Plasticity
Neural Coding
Biological Networks
The next four road maps, Biological Neurons and Synapses, Neural Plasticity, Neural Coding, and Biological Networks, are ones that, for many readers, may provide the appropriate entry point for the book as a whole, namely, an understanding of neural networks from a biological point of view. The road map Biological Neurons and Synapses gives us some sense of how sophisticated real biological neurons are, with each patch of membrane being itself a subtle electrochemical structure. An appreciation of this complexity is necessary for the computational neuroscientist wishing to address the increasingly detailed database of experimental neuroscience on how signals can be propagated, and how individual neurons interact with each other. But such complexity may also provide an eye opener for the technologist planning to incorporate new capabilities into the next generation of ANNs. The road map Neural Plasticity then charts from a biological point of view a variety of specific mechanisms at the level of synapses, or even finer-grained molecular structures, which enable the changes in the strength of connections that underlie both learning and development. A number of such mechanisms have already implied a variety of learning rules for ANNs (see Learning in Artificial Networks), but they also include mechanisms that have not seen technological use. This road map includes articles that analyze mechanisms that underlie both development and regeneration of neural networks and learning in biological systems. However, I again stress to the reader that one may approach the road maps, and the articles in Part III of this Handbook, in many different orders, so that some readers may prefer to study the articles described in the road map Learning in Artificial Networks before or instead of studying those on neurobiological learning mechanisms.
Two more road maps round out our study of Biological Neurons and Networks. The simplest models of neurons either operate on a discrete-time scale or measure neural output by the continuous variation in firing rate. The road map Neural Coding examines the virtues of other alternatives, looking at both the possible gains in information that may follow from exploiting the exact timing of spikes (action potentials) as they travel along axonal branches from one neuron to many others, and the way in which signals that may be hard to discern from the firing of a single neuron may be reliably encoded by the activity of a whole population of neurons. We then turn to articles that chart a number of the basic architectures whereby biological neurons are combined into Biological Networks—although clearly, this is a topic expanded upon in many articles in Part III which are not explicitly presented in this road map.
Dynamics and Learning in Artificial Networks
Dynamic Systems
Learning in Artificial Networks
Computability and Complexity
The next three road maps—Dynamic Systems, Learning in Artificial Networks, and Computability and Complexity—provide a broad perspective on the dynamics of neural networks considered as general information processing structures rather than as models of a particular biological or psychological phenomenon or as solutions to specific technological problems. Our study of Dynamic Systems is grounded in studying the dynamics of a neural network with fixed inputs: does it settle down to an equilibrium state, and to what extent can that state be seen as the solution of some problem of optimization? Under what circumstances will the network exhibit a dynamic pattern of oscillatory behavior (a limit cycle), and under what circumstances will it undergo chaotic behavior (traversing what is known as a strange attractor)? This theme is expanded by the study of cooperative phenomena. In a gas or a magnet, we do not know the behavior of any single atom with precision, but we can infer the overall “cooperative” behavior—the pressure, volume, and temperature of a gas, or the overall magnetization of a magnet—through statistical methods, methods which even extend to the analyses of such dramatic phase transitions as that of a piece of iron from an unmagnetized lump to a magnet, or of a liquid to a gas. So, too, can statistical methods provide insight into the large-scale properties of neural nets, abstracting away from the detailed function of individual neurons, when our interest is in statistical patterns of behavior rather than the fine details of information processing. This leads us to the study of self-organization in neural networks, in which we ask for ways in which the interaction between elements in a neural network can lead to the spontaneous expression of pattern; whether this pattern is constituted by the pattern of activity of the individual neurons or by the pattern of synaptic connections which records earlier experience.
With this question of earlier experience, we have fully made the transition to the study of learning, and we turn to the road map which focuses on Learning in Artificial Networks, complementing the road map Neural Plasticity. (This replaces two road maps from the first edition–Learning in Artificial Neural Networks, Deterministic, and Learning in Artificial Neural Networks, Statistical—for two reasons: (1) the use of statistical methods in the study of learning in ANNs is so pervasive that the attempt to distinguish deterministic and statistical approaches to learning is not useful, and (2) the statistical analysis of learning in ANNs has spawned a variety of statistical methods that are less closely linked to neurobiological inspiration, and we wish these, too, to be included in our road map.) The study of Computability and Complexity then provides a rapprochement between neural networks and a number of ideas developed within the mainstream of computer science, especially those arising from the study of complexity of computational structures. Indeed, it takes us back to the very foundations of the theory of neural networks, in which the study of McCulloch-Pitts neurons built on earlier work on computability to inspire the later development of automata theory.
Sensory Systems
Vision
Other Sensory Systems
Vision has been the most widely studied of all sensory systems, both in brain theory and in applications and analysis of ANNs, and thus has a special road map of its own. Other Sensory Systems, treated at less length in the next road map, include audition, touch, and pain, as well a number of fascinating special systems such as electrolocation in electric fish and echolocation in bats.
Motor Systems
Robotics and Control Theory
Motor Pattern Generators
Mammalian Motor Control
The next set of road maps—Robotics and Control Theory, Motor Pattern Generators, and Mammalian Motor Control—addresses the control of movement by neural networks. In the study of Robotics and Control Theory, the adaptive properties of neural networks play a special role, enabling a control system, through experience, to become better and better suited to solve a given repertoire of control problems, guiding a system through a desired trajectory, whether through the use of feedback or feedforward. These general control strategies are exemplified in a number of different approaches to robot control. The articles in the road map Motor Pattern Generators focus on subconscious functions, such as breathing or locomotion, in vertebrates and on a wide variety of pattern-generating activity in invertebrates. The reader may wish to turn back to the road map Neuroethology and Evolution for other studies in animal behavior (neuroethology) which show how sensory input, especially visual input, and motor behavior are integrated in a cycle of action and perception. Mammalian Motor Control places increased emphasis on the interaction between neural control and the kinematics or dynamics of limbs and eyes, and also looks at various forms of motor-related learning. In showing how the goals of movement can be achieved by a neural network through the time course of activity of motors or muscles, this road map overlaps some of the issues taken up in the more applications-
oriented road map, Robotics and Control Theory. Much of the material on biological motor control is of general relevance, but the road map also includes articles on primate motor control that examine a variety of movements of the eyes, head, arm, and hand which are studied in a variety of mammals but are most fully expressed in primates and humans. Of course, as many readers will be prepared to notice by now, Mammalian Motor Control will, for some readers, be an excellent starting place for their study, since, by showing how visual and motor systems are integrated in a number of primate and human behaviors, it motivates the study of the specific neural network mechanisms required to achieve these behaviors.
Applications, Implementations, and Analysis
Applications
Implementation and Analysis
We then turn to a small set of Applications of neural networks, which include signal processing, speech recognition, and visual processing (but exclude the broader set of applications to astronomy, speech recognition, high-energy physics, steel making, telecommunications, etc., of the first edition, since The Handbook of Neural Computation [Oxford University Press, 1996] now provides a large set of articles on ANN applications). Since a neural network cannot be applied unless it is implemented, whether in software or hardware, we close with the road map Implementation and Analysis. The implementation methodologies include simulation on a general-purpose computer, emulation on specially designed neurocomputers, and implementation in a device built with electronic or photonic materials. As for analysis, we present articles in the nascent field of neuroinformatics which combines database methodology, visualization, modeling, and data analysis in an attempt to master the explosive growth of neuroscience data. (In Europe, the term neuroinformatics is used to encompass the full range of computational approaches to brain theory and neural networks. In the United States, some people use neuroinformatics to refer solely to the use of databases in neuroscience. Here we focus on the middle ground, where the analysis of data and the construction of models are brought together.)
The first two road maps expand the exposition of Part I by presenting basic models of neurons and networks that provide the building blocks for many of the articles in Part III.
Axonal Modeling
Dendritic Processing
Hebbian Synaptic Plasticity
Perceptrons, Adalines, and Backpropagation
Perspective on Neuron Model Complexity
Reinforcement Learning
Single-Cell Models
Spiking Neurons, Computation with
This road map introduces classes of neuron models of increasing complexity and attention to detail. The point is that much can be learned even at high degrees of abstraction, while other phenomena can be understood only by attention to subtle details of neuronal function. The reader of this Handbook will find many articles exploring biological phenomena and technological applications at different levels of complexity. The implicit questions will always be, “Do all the details matter?” and “Is the model oversimplified?” The answers will depend both on the phenomena under question and on the current subtlety of experimental investigations. After introducing articles that present neuron models across the range of model complexity, the road map concludes with a brief look at the most widely analyzed forms of synaptic plasticity.
Classes of neuron models can be defined by how they treat the train of action potentials issued by a neuron (see the road map Neural Coding). Many models assume that information is carried in the average rate of pulses over a time much longer than a typical pulse width, with the occurrence times of particular pulses simply treated as jitter on an averaged analog signal. A neural model in such a theory might be a mathematical function which produces a real-valued output from its many real-valued inputs; that function could be linear or nonlinear, static or adaptive, and might be instantiated in analog silicon circuits or in digital software. Examples given of such models in Single-Cell Models are the McCulloch-
Pitts model, the perceptron model, Hopfield neurons, and polynomial neurons. However, some models assume that each single neural pulse carries reliable, precisely timed information. A neural model in such a theory fires only upon the exact coincidence of several input pulses, and quickly “forgets” when it last fired, so that it is always ready to fire upon another coincidence. The simplest such models are the integrate-and-fire models. The article concludes by briefly introducing the Hodgkin-Huxley model of squid axon, based on painstaking analysis (without benefit of electronic computers) of data from the squid giant axon, and then introduces modified single-point models, compartmental models, and computation with both passive dendrites and active dendrites. Spiking Neurons, Computation with provides more detail on those neuron models of intermediate complexity in which the output is a spike whose timing is continuously variable as a result of cellular interactions, providing a model of biological neurons that offers more details than firing rate models but without the details of biophysical models. The virtues of such models include the ability to transmit information very quickly through small temporal differences between the spikes sent out by different neurons. Information theory can be used to quantify how much more information about a stimulus can be extracted from spike trains if the precise timing is taken into account. Moreover, computing with spiking neurons may prove of benefit for technology.
Axonal Modeling is centered on the Hodgkin and Huxley model, arguably the most successful model in all of computational neuroscience. The article shows how the Hodgkin-Huxley equations extend the cable equation to describe the ionic mechanisms underlying the initiation and propagation of action potentials. The vast majority of contemporary biophysical models use a mathematical formalism similar to that introduced by Hodgkin and Huxley, even though their model of the continuous, deterministic, and macroscopic permeability changes of the membrane was achieved without any knowledge of the underlying all-or-none, stochastic, and microscopic ionic channels. The article also describes the differences between myelinated and nonmyelinated axons; and briefly discusses the possible role of heavily branched axonal trees in information processing.
Perspective on Neuron Model Complexity then shows how this type of modeling might be extended to the whole cell. The key point is that one neuron with detailed modeling of dendrites (especially with nonuniform distributions of synapses and ion channels) can perform tasks that would require a network of many simple binary units to duplicate. The point is not to choose the most complex model of a neuron but rather to seek an intermediate level of complexity which preserves the most significant distinctions between different “compartments” of the neuron (soma, various portions of the dendritic tree, etc.). The challenge is to demonstrate a useful computation or discrimination that can be accomplished with a particular choice of compartments in a neuron model, and then show that this useful capacity is lost when a coarser decomposition of the neuron is used. Dendritic Processing especially emphasizes developments in compartmental modeling of dendrites, arguing that we are in the midst of a “dendritic revolution” that has yielded a much more fascinating picture of the electrical behavior and chemical properties of dendrites than one could have imagined only a few years ago. The dendritic membrane hosts a variety of nonlinear voltage-gated ion channels that endow dendrites with potentially powerful computing capabilities. Moreover, the classic view of dendrites as carrying information unidirectionally, from synapses to the soma, has been transformed: dendrites of many central neurons also carry information in the “backward” direction, via active propagation of the action potentials from the axon to the dendrites. These “reversed” signals can trigger plastic changes in the dendritic input synapses. Moreover, it is now known that the fine morphology as well as the electrical properties of dendrites change dynamically, in an activity-dependent manner.
If the most successful model in all of computational neuroscience is the Hodgkin-Huxley model, then the second most successful is Hebb’s model of “unsupervised” synaptic plasticity. The former was based on rigorous analysis of empirical data; the latter was initially the result of theoretical speculation on how synapses might behave if assemblies of cells were to work together to store and reconstitute thoughts and associations. Hebbian Synaptic Plasticity notes that predictions derived from Hebb’s postulate can be generalized for different levels of integration (synaptic efficacy, functional coupling, adaptive change in behavior) by simply adjusting the variables derived from various measures of neural activity and the time-scale over which it operates. The article addresses five major issues: Should the definition of “Hebbian” plasticity refer to a simple positive correlational rule of learning, or are there biological justifications for including additional “pseudo-Hebbian” terms (such as synaptic depression due to disuse or competition) in a generalized phenomenological algorithm? What are the spatiotemporal constraints (e.g., input specificity, temporal associativity) that characterize the induction process? Do the predictions of Hebbian-based algorithms account for most forms of activity-dependent dynamics in synaptic transmission throughout phylogenesis? On which time-scales (perception, learning, epigenesis) and at which stage of development of the organism (embryonic, “critical” postnatal developmental periods, adulthood) are activity-dependent changes in functional links predicted by Hebb’s rule? Are there examples of correlation-based plasticity that contradict the predictions of Hebb’s postulate (termed anti-Hebbian modifications)? The article thus frames many important issues to be developed in the articles of the road map Neural Plasticity but that are also implicit, for example, in articles reviewed in the road maps Psychology and Linguistics and Speech Processing, in which Hebbian (and other) learning rules are used for “formal neurons” that are psychological abstractions rather than representation of real neurobiological neurons or even biological neuron pools. Two other articles serve to introduce the basic learning rules that have been most central in both biological analysis and connectionist modeling. Supervised learning adjusts the weights in an attempt to respond to explicit error signals provided by a “teacher,” which may be external, or another network in the same “brain.” This model was introduced in the perceptron model, which is reviewed in Perceptrons, Adalines, and Backpropagation (of which more details in the next road map, Grounding Models of Networks). On the other hand, Reinforcement Learning (of which more details in the road map Learning in Artificial Networks) shows how networks can improve their performance when given general reinforcement (“that was good,” “that was bad”) by a critic, rather than the explicit error information offered by a teacher.
Associative Networks
Computing with Attractors
Perceptrons, Adalines, and Backpropagation
Radial Basis Function Networks
Self-Organizing Feature Maps
Spiking Neurons, Computation with
The mechanisms and implications of association—the linkage of information with other information—have a long history in psychology and philosophy. Associative Networks discusses association as realized in neural networks as well as association in the more traditional senses. Many neural networks are designed as pattern associators, which link an input pattern with the “correct” output pattern. Learning rules are designed to construct useful linkages between input and output patterns whether in feedforward neural network architectures or in a network whose units are recurrently interconnected. Special attention is given to the critical importance of data representation at all levels of network operation. Perceptrons, Adalines, and Backpropagation introduces the perceptron rule and the LMS (least-mean-squares) algorithm for training feedforward networks with multiple adaptive elements, where each element can be seen as an adaptive linear combiner of its inputs followed by a nonlinearity which produces the output. It then presents the major extension provided by the backpropagation algorithm for training multilayer neural networks—which can be viewed as dividing the input space into regions bounded by hyperplanes, one for the thresholded output of each neuron of the output layer—and shows how this technique has been used to attack problems requiring neural networks with high degrees of nonlinearity and precision.
Computing with Attractors shows how neural networks (often seen now as operating in continuous time) may be viewed as dynamic systems (a theme developed in great detail by the articles of the road map Dynamic Systems). This article describes how to compute with networks with feedback, with the input of a computation being set as an initial state for the system and the result read off a suitably chosen set of units when the network has “settled down.” The state a dynamical system settles into is called an attractor, so this paradigm is called computing with attractors. It is possible to settle down into an equilibrium state, or into periodic or even chaotic patterns of activity. (An interesting possibility, not considered in this article, is to perform computations based on the transient approach to the attractor, rather than on the basis of the attractor alone.) The Hopfield model for associative memory is used as the key example, showing its dynamic behavior as well as how the connections necessary to embed desired patterns can be learned and how the paradigm can be extended to time-dependent attractors.
Self-Organizing Feature Maps (SOFMs) introduces a famous version of competitive learning based on a layer of adaptive “neurons” that gradually develops into an array of feature detectors. The learning method is an augmented Hebbian method in which learning by the element most responsive to an input pattern is “shared” with its neighbors. The result is that the resulting “compressed image” of the (usually higher-dimensional) input space forms a “topographic map” in which distance relationships in the input space (expressing, e.g., pattern similarities) are approximately preserved as distance relationships between corresponding excitation sites in the map, while clusters of similar input patterns tend to become mapped to areas in the neural layer whose size varies in proportion to the frequency of the occurrence of their patterns. From a statistical point of view, the SOFM provides a nonlinear generalization of principal component analysis.
Spiking Neurons, Computation with discusses both the use of spiking neurons as a useful approximation to biological neurons and the study of networks of spiking neurons as a formal model of computation for which the assumptions need not be biological (see also “Integrate-and-Fire Neurons and Networks”). If the spiking neurons are not subject to significant amounts of noise, then one can carry out computations in networks of spiking neurons where every spike matters, and some finite network of spiking neurons can simulate a universal Turing machine. Spiking neurons can also be used as computational units that function like radial basis functions in the temporal domain. Another code uses the order of firing of different neurons as the relevant signal conveyed by these neurons. Firing rates of neurons in the cortex are relatively low, making it hard for the postsynaptic neuron to “read” the firing rate of a presynaptic neuron. However, networks of spiking neurons can carry out complex analog computations if the inputs of the computation are presented in terms of a space rate or population code.
The last article in this road map gives an example of the utility of studying networks in which the response properties of the individual units are designed not as abstractions from biological neurons, but rather because their response functions have mathematically desirable properties. A multilayer perceptron can be viewed as dividing the input space into regions bounded by hyperplanes, one for the thresholded output of each neuron of the output layer. Radial Basis Function Networks describes an alternative approach to decomposition of a pattern space into regions, describing the clusters of data points in the space as if they were generated according to an underlying probability density function. Thus the perceptron method concentrates on class boundaries, while the radial basis function approach focuses on regions where the data density is highest, constructing global approximations to functions using combinations of basis functions centered around weight vectors. The article shows that this approach not only has a range of useful theoretical properties but also is practically useful, having been applied efficiently to problems in discrimination, time-series prediction, and feature extraction.
Command Neurons and Command Systems
Crustacean Stomatogastric System
Echolocation: Cochleotopic and Computational Maps
Electrolocation
Evolution and Learning in Neural Networks
Evolution of Artificial Neural Networks
Evolution of Genetic Networks
Evolution of the Ancestral Vertebrate Brain
Hippocampus: Spatial Models
Language Evolution and Change
Language Evolution: The Mirror System Hypothesis
Locomotion, Vertebrate
Locust Flight: Components and Mechanisms in the Motor
Motor Primitives
Neuroethology, Computational
Olfactory Cortex
Scratch Reflex
Sensorimotor Interactions and Central Pattern Generators
Sound Localization and Binaural Processing
Spinal Cord of Lamprey: Generation of Locomotor Patterns
Visual Course Control in Flies
Visuomotor Coordination in Frog and Toad
Visuomotor Coordination in Salamander
Many readers will come to the Handbook with one of two main motivations: to understand the human brain, or to explore the potential of ANNs as a technology for adaptive, parallel computation. The present road map emphasizes a third motivation: to study neural mechanisms in creatures very different from humans and their mammalian cousins—for the intrinsic interest of discovering the diverse neural architectures that abound in nature, for the suggestions these provide for future technology, and for the novel perspective on human brain mechanisms offered by seeking to place them in an evolutionary perspective.
Ethology is the study of animal behavior, in which our concern is with the circumstances under which a particular motor pattern will be deployed as an appropriate part of the animal’s activity. Neuroethology, then, is the study of neural mechanisms underlying animal behavior. The emphasis is thus on an integrative, systems approach to the neuroscience of the animal being studied, as distinct from a reductionist approach to, for example, the neurochemistry of synaptic plasticity. Of course, a major aim of this Handbook is to create a context in which the reader can see both approaches to the study of nervous systems and ponder how best to integrate them. In particular, the reader will find many examples of the neuroethology of mammalian systems in a wide variety of other road maps, such as Cognitive Neuroscience, Vision, Other Sensory Systems, and Mammalian Motor Control. However, the present road map is designed to guide the reader to articles on a number of fascinating nonmammalian systems—as well as a few “exotic” mammalian systems—as a basis for a brief introduction of the evolutionary approach to biological and artificial neural networks.
Neuroethology, Computational suggests that computational neuroethology applies not only to animals but also to nonbiological autonomous agents, such as some types of robots and simulated embodied agents operating in virtual worlds (see also “Embodied Cognition”). The key element is the use of sophisticated computer-based simulation and visualization tools to study the neural control of behavior within the context of “agents” that are both embodied and situated within an environment. Other examples include specific neuroethological modeling directed toward specific animals (the computational frog Rana computatrix and the computational cockroach Periplaneta computatrix) and their implications for the rebirth of ideas first introduced by Grey Walter in his 1950s design of Machina speculatrix and later developed in the book Vehicles by Valentino Braitenberg.
If a certain interneuron is stimulated electrically in the brain of a marine slug, the animal then displays a species-specific escape swimming behavior, although no predator is present. If in a toad a certain portion of the optic tectum is stimulated in this manner, snapping behavior is triggered, although no prey is present. In both cases, a stimulus produces a rapid ballistic response. Such command functions provide the sensorimotor interface between sensory pattern recognition and localization, on the one side, and motor pattern generation on the other. Command Neurons and Command Systems analyzes the extent to which a motor pattern generator (MPG) may be activated alone or in concert with others through perceptual stimuli mediated by a single “command neuron” (as in the marine slug) or by more diffuse “command systems” (as in the toad). Three articles then focus specifically on visuomotor coordination. Visual Course Control in Flies explains the mechanisms underlying the extraction of retinal motion patterns in the fly, and their transformation into the appropriate motor activity. Rotatory large-field motion can signal unintended deviations from the fly’s course and initiate a compensatory turn; image expansion can signal that the animal approaches an obstacle and initiates a landing or avoidance response; and discontinuities in the retinal motion field indicate nearby stationary or moving objects. Since many of the cells responsible for motion extraction are large and individually identifiable, the fly is quite amenable to an analysis of sensory processing. Similarly, the small number of muscles and motor neurons used to generate flight maneuvers facilitates studies of motor output. Visuomotor Coordination in Salamander shows how low-level mechanisms add up to produce complicated behaviors, such as the devious approach of salamanders to their prey. Coarse coding models demonstrate how the location of an object may be encoded with high accuracy using only a few neurons with large, overlapping receptive fields. (This fits with the fact that the brains of salamanders are anatomically the simplest among vertebrates, containing only about 1 million neurons—frogs have up to 10 times and humans 10 million times as many neurons.) The models have been extended to the case where several objects are presented to the animal by linking a segmentation network and a winner-take-all-like object selection network to the coarse coding network in a biologically plausible way. Compensation of background movement, selection of an object, saccade generation, and approach and snapping behavior in salamanders have also been modeled successfully, in line with behavioral and neurobiological findings. Again, Visuomotor Coordination in Frog and Toad stresses that visuomotor integration implies a complex transformation of sensory data, since the same locus of retinal activation might release behavior directed toward the stimulus (as in prey catching) or toward another part of the visual field (as in predator avoidance). The article also shows how the efficacy of visual stimuli to release a response is determined by many factors, including the stimulus situation, the motivational state of the organism itself, and previous experience with the stimulus (learning and conditioning), and the physical condition of the animal’s CNS (e.g., brain lesions). In addition, other types of sensory signals can modulate frogs’ and toads’ response to certain moving visual stimuli. For example, the efficacy of a visual stimulus may be greatly enhanced by the presence of prey odor.
Motor Primitives and Scratch Reflex are two of the articles on nonmammalian animal behaviors that are described more fully in the road map Motor Pattern Generators. These articles examine the behavior elicited in frogs and turtles, respectively, by an irritant applied to the animal’s skin. The former article examines the extent to which motor behaviors can be built up through a combination of a small set of basic elements; the latter emphasizes how the form of the scratch reflex changes dramatically, depending on the locus of the irritant. Other articles in the road map Motor Pattern Generators describe mechanisms underlying a variety of forms of locomotion (swimming, walking, flying).
Sound Localization and Binaural Processing uses data from owls, which are exquisitely skillful in using auditory signals to locate their prey, even in the dark, to anchor models which explain how information from the two ears is brought together to localize the source of a sound. The article focuses on the use of interaural time difference (ITD) as one way to estimate the azimuthal angle of a sound source. It describes one biological model (ITD detection in the barn owl’s brainstem) and two psychological models. The underlying idea is that the brain attempts to match the sounds in the two ears by shifting one sound relative to the other, with the shift that produces the best match assumed to be the one that just balances the “real” ITD.
Echolocation: Cochleotopic and Computational Maps explores the highly specialized auditory system used by mustached bats to analyze the return signals from the biosonar pulses they emit for orientation and for hunting flying insects. Each biosonar pulse consists of a long constant-frequency (CF) component followed by a short frequency-modulated (FM) component. The CF components constitute an ideal signal for target detection and the measurement of target velocity (relative motion in a radial direction and wing beats of insects), whereas the short FM components are suited for ranging, localizing, and characterizing a target. The article shows how different parameters of echoes received by the bat carry different types of information about a target and how these may be structured in computational maps via parallel-hierarchical processing of different types of biosonar signals. These maps guide the bat’s behavior. Electrolocation discusses another “exotic” sensory system related to behavior, this time the electrosensory systems of weakly electric fish. Animals with active electrosensory systems generate an electrical field around their body by means of an electrical organ located in the trunk and tail, and measure this field via electroreceptors embedded in the skin. Distortions of the electrical field due to animate or inanimate targets in the environment or signals generated by other fish provide inputs to the system, and several distinct behaviors can be linked to patterns of electrosensory input. The article focuses on progress in understanding electrolocation behavior and on the neural implementation of an adaptive filter that attenuates the effects of the fish’s own movements.
We now turn to motor systems. Crustacean Stomatogastric System shows that work on the rhythmic motor patterns of the four areas of the crustacean stomach, the esophagus, cardiac sac, gastric mill, and pylorus, has identified four widely applicable properties. First, rhythmicity in these highly distributed networks depends on both network synaptic connectivity and slow active neuronal membrane properties. Second, modulatory influences can induce individual networks to produce multiple outputs, “switch” neurons between networks, or fuse individual networks into single larger networks. Third, modulatory neuron terminals receive network synaptic input. Modulatory inputs can be sculpted by network feedback and become integral parts of the networks they modulate. Fourth, network synaptic strengths can vary as a function of pattern cycle period and duty cycle.
The lamprey is a very primitive form of fish whose spinal cord supports a traveling wave of activity that yields the swimming movements of the animal’s body, yet also persists (“fictive swimming”) when the spinal cord is isolated from the body and kept alive in a dish. Spinal Cord of Lamprey: Generation of Locomotor Patterns reviews the data which ground a circuit diagram for the spinal cord circuitry, then shows how the lamprey locomotor network has been simulated. There are a number of neuromodulators present in the lamprey spinal cord that alter the output of the locomotor network. These substances, such as serotonin, dopamine, and tachykinins, offer good opportunities to test our knowledge of the locomotor system by combining the cellular and synaptic actions of the modulators into detailed network models. However, models that do not depend on details of individual cells have also proved useful in advancing our understanding of lamprey locomotion such as the control of turning. Other models probe the nature of the coupling among the rhythm generators, explaining how it may be that the speed of the head-to-tail propagation of the rhythmic activity down the spinal cord can vary with the speed of swimming even though conduction delays in axons are fixed. Locust Flight: Components and Mechanisms in the Motor stresses that locust flight motor patterns are generated by an interactive mixture of the intrinsic properties of flight neurons, the operation of complex circuits, and phase-specific proprioceptive input. These mechanisms are subject to the concentrations of circulating neuromodulators and are also modulated according to the demands of a constantly changing sensory environment to produce adaptive behaviors. The system is flexible and able to operate despite severe ablations, and then to recover from these lesions. Sensorimotor Interactions and Central Pattern Generators analyzes basic properties of the biological systems performing sensorimotor integration. The article discusses both the impact of sensory information on central pattern generators and the less well-understood influence of motor systems on sensory activity. Interaction between motor and sensory systems is pervasive, from the first steps of sensory detection to the highest levels of processing. While there is no doubt that cortical systems contribute to sensorimotor integration, the article questions the view that motor cortex sends commands to a passively responsive spinal cord. Motor commands are only acted upon as spinal circuits integrate their intrinsic activity with all incoming information.
Turning to evolution, we find two classes of articles. We first look at those which focus on simulated evolution in ANNs, with emphasis on the role of evolution as an alternative learning mechanism to fit network parameters to yield a network better adapted to a given task. We then turn to articles more closely related to comparative and evolutionary neurobiology.
When neural networks are studied in the broader biological context of artificial life (i.e., the attempt to synthesize lifelike phenomena within computers and other artificial media), they are sometimes characterized by genotypes and viewed as members of evolving populations of networks in which genotypes are inherited from parents to offspring. Evolution of Artificial Neural Networks shows how ANNs can be evolved by using evolutionary algorithms (also known as genetic algorithms). An initial population of different artificial genotypes, each encoding the free parameters (e.g., the connection strengths and/or the architecture of the network and/or the learning rules) of a corresponding neural network, is created randomly. (An important challenge for future research is to study models in which the genotypes are more “biological” in nature, and less closely tied to direct description of the phenotype.) The population of networks is evaluated in order to determine the performance (fitness) of each individual network. The fittest networks are allowed to reproduce by generating copies of their genotypes, with the addition of changes introduced by genetic operators such as mutations (i.e., the random change of a few genes that are selected randomly) or crossover (i.e., the combination of parts of the genotype derived from two reproducing networks). This process is repeated for a number of generations until a network that satisfies the performance criterion set by the experimenter is obtained. Locomotion, Vertebrate shows that the combination of neural models with biomechanical models has an important role to play in addressing the evolutionary challenge of seeing what modifications may have occurred in the locomotor circuits between the generation of traveling waves for swimming (the most ancestral vertebrates were close to the lamprey), the generation of standing waves for walking, and the generation of multiple gaits for quadruped locomotion, and on to biped locomotion. One example uses “genetic algorithms” to model the transition from a lamprey-like spinal cord that supports traveling waves to a salamander-like spinal cord that supports both traveling waves for swimming and “standing waves” for terrestrial locomotion, and then shows how vision may modulate spinal activity to yield locomotion toward a goal (see also Visuomotor Coordination in Salamander).
Evolution and Learning in Neural Networks then extends the analysis of ANN evolution to networks that are able to adapt to the environment as a result of some form of lifetime learning. Where evolution is capable of capturing relatively slow environmental changes that might encompass several generations, learning allows an individual to adapt to environmental changes that are unpredictable at the generational level. Moreover, while evolution operates on the genotype, learning affects the phenotype, and phenotypic changes cannot directly modify the genotype. The article shows how ANNs subjected both to an evolutionary and a lifetime learning process have been studied to look at the advantages, in terms of performance, of combining two different adaptation techniques and also to help understand the role of the interaction between learning and evolution in natural organisms. Continuing this theme, Language Evolution and Change offers another style of “connectionist evolution,” placing a number of connectionist models of basic forms of language processing in an evolutionary perspective. In some cases, connectionist networks are used as simulated agents to study how social transmission via learning may give rise to the evolution of structured communication systems. In other cases, the specific properties of neural network learning are enlisted to help illuminate the constraints and processes that may have been involved in the evolution of language. The article surveys this connectionist research, starting from the emergence of early syntax, to the role of social interaction and constraints on network learning in the subsequent evolution of language, to linguistic change within existing languages.
With this we turn to the study of evolution in the sense of natural selection in biological systems, building on the insights of Charles Darwin. Since brains do not leave fossils, evolutionary work is more at the level of comparative neurobiology, looking at the nervous systems of currently extant species, then trying to build a “family tree” of possible ancestors. The idea is that we may gain deeper insights into the brains of animals of a given species if we can compare them with the brains of other species, make plausible inferences about the brain structure of their common ancestor, and then seek to relate differences between the current brains and the putative ancestral brains by relating these changes to the possible evolutionary pressures that caused each species to adapt to a specific range of environments. Evolution of the Ancestral Vertebrate Brain notes that efforts to understand how the evolving brain has adapted to specific environmental constraints are complicated because there are always several ways to implement a certain function within existing connections using molecular and cellular mechanisms. In any case, adult diversity is viewed as the outcome of divergent genetic developmental mechanisms. Thus, study of adult structures is aided by placing adult structures within their developmental history as structured by the genes that guide such development. The article introduces a possible prototype of the ancestral vertebrate brain, followed by a scenario for mechanisms that may have diversified the ancestral vertebrate brain. Evolution of the brainstem oculomotor system is used as a focal case study.
The study of gene expression patterns is playing an increasingly important role in the empirical study of brains and neurons, and the pace of innovation in this area has greatly accelerated with the publication of two maps of the human genome as well as genome maps for more and more other species. As of 2002, however, the impact of “genomic neuroscience” on computational neuroscience is still small. To help readers think about the promise of increasing this impact, we not only have the discussion in Evolution of the Ancestral Vertebrate Brain of how during development the CNS becomes polarized and then subdivides into compartments, each characterized by specific pattern of gene expression, but also a companion article, Evolution of Genetic Networks, which outlines some of the computational problems in modeling genetic networks that can direct the establishment of a diversity of neuronal networks in the brain. Since neuronal networks are composed of a wide variety of different cell types, the final fate or end-stage of each cell type represents the outcome of a dynamic amalgamation of gene networks. Genetic networks not only determine the cell fate acquisition from the original stem cell, they also govern contact formation between the cell populations of a given neuronal network. There are intriguing parallels between the establishment and functioning of genetic networks with those of neuronal networks, which can range from simple (on-off switch) to complex. To give some sense of the complexity of organismic development, the article outlines how intracellular as well as cell-cell interactions modify the complexity of gene interactions involved in genetic networks to achieve an altered status of cell function and, ultimately, the connection alterations in the formation of neuronal networks.
Olfactory Cortex describes how, during phylogeny, the paleocortex and archicortex develop in extent and complexity but retain their three-layered character, whereas neocortex emerges in mammals as a five- to six-layered structure. It stresses the evolutionary significance of the olfactory cortex and includes an account for brain theorists interested in principles of cortical organization of the early appearance of the olfactory cortex in phylogeny. Certainly, the cerebral cortex is a distinctive evolutionary feature of the mammalian brain (which does not mean that it is “better” than structures in other genera to which it may be more or less related), and the next articles give two perspectives on its structure. “Grasping Movements: Visuomotor Transformations” presents the interactions of visual areas of parietal cortex with the F5 area of premotor cortex in the monkey brain in serving the visual control of hand movements. The companion article, Language Evolution: The Mirror System Hypothesis, starts from the observations that monkey F5 contains a special set of “mirror neurons” active not only when the monkey performs a specific grasp, but also when the monkey sees others perform a similar task; that F5 is homologous to human Broca’s area, an area of cortex usually thought of as related to speech production; but that Broca’s area also seems to contain a mirror system for grasping. These facts are used to ground a new theory of the evolution of the human brain mechanisms that support language. It adds a neurological “missing link” to the long-held view that imitation and communication based on hand signs may have preceded the emergence of human mechanisms for extensive vocal communication. With this example to hand, the reader is invited to look through the book for articles that study specific brain mechanisms or specific behaviors in a number of species more or less related to the human. The challenge then is to chart what aspects are common to human brains and the brains more generally of primates, mammals, or even vertebrates; and then, having done so, to see what, if any, distinctive properties human brain and behavior possess. One can then seek an evolutionary account which illuminates these human capacities. For example, it is well known that the human hippocampus is crucial for the creation of episodic memories, our memories of episodes located in specific contexts of space and time (though these memories are eventually consolidated outside hippocampus). On the other hand, Hippocampus: Spatial Models emphasizes the role of the hippocampus and related brain regions in building a map of spatial relations in the rat’s world. To what extent can we come to better understand human episodic memory by looking for the generalization from a spatial graph of the environment to one whose nodes are linked in both space and time?
Auditory Cortex
Auditory Periphery and Cochlear Nucleus
Basal Ganglia
Cerebellum and Conditioning
Cerebellum and Motor Control
Collicular Visuomotor Transformations for Gaze Control
Grasping Movements: Visuomotor Transformations
Hippocampal Rhythm Generation
Hippocampus: Spatial Models
Motor Cortex: Coding and Decoding of Directional Operations
Neuroanatomy in a Computational Perspective
Olfactory Bulb
Olfactory Cortex
Prefrontal Cortex in Temporal Organization of Action
Retina
Somatosensory System
Thalamus
Visual Cortex: Anatomical Structure and Models of Function
Visual Scene Perception, Neurophysiology
This road map introduces the conceptual analysis and neural network modeling of a variety of regions of the mammalian brain. The fact that these regions recur in many articles not listed above emphasizes complementary ways of exploring the mammalian brain in a top-down fashion, starting either from the gross anatomy (what does this region of the brain do? the approach of this road map) or from some function (sensory, perceptual, memory, motor control, etc., as in other road maps). These top-down approaches may be contrasted with bottom-up approaches, which may start from neurons and seek to infer properties of circuits, or may start from biophysics and neurochemistry and seek to infer properties of neurons (as in much of the road map Biological Neurons and Synapses). It must be stressed that whole books can be, and have been, written on each of the brain regions discussed below. The aim of each article is to get the reader started by seeing how a selection of biological data can be addressed by models that seek to illuminate them. In some cases (especially near the sensory or motor periphery), the function of a region is clear, and there is little question as to which phenomena the brain theorist is to explain. But in more central regions, what the experimentalist observes may vary wildly with the questions that are asked, and what the modeler has to work with is more like a Rorschach blot than a well-defined picture.
Neuroanatomy in a Computational Perspective notes that a vertebrate brain contains so many neurons, and each neuron has so many connections, that the task of neuroanatomy is not so much to study all the connections in detail, but rather to reveal the typical structural properties of a particular part of the brain, which then provide clues to understanding its various functions. Large brains have comparatively more cortical white matter (i.e., regions containing only axons) than small brains. Moreover, distant elements in a large brain may not be able to collaborate efficiently because of the transmission delays from one point to the other. The way out of this problem may be a higher degree of functional specialization of cortical regions in larger brains. The article provides many data on cortical structure (my favorite is that there are 4 kilometers of axons in each cubic millimeter of mouse cortex) and argues that the data fit Hebb’s theory of cell assemblies in that precisely predetermined connections are not required since, as a result of learning, the patterns of interactions between neurons will be different for each brain. What is crucial, however, is an initial connectivity sufficiently rich to allow as many constellations of neuronal activity as possible to be detected and “learned” in the connections.
Many of the articles in this road map are associated with sensory systems: vision, body sense (the somatosensory system), hearing (the auditory system), and smell (the olfactory system). Several articles then discuss brain regions more associated with motor control, learning, and cognition. Five articles take us through the visual system. We start with the Retina, the outpost of the brain that contains both light-sensitive receptors and several layers of neurons that “preprocess” these responses. Instead of simply coding light intensity, the retina transforms visual signals in a multitude of ways to code properties of the visual world, such as contrast, color, and motion. The article develops a conceptual theory to explain how the structure of the retina is related to its function of coding visual signals. One hypothesis is that much of the retina’s signal coding and structural detail is derived from the need to optimally amplify the signal and eliminate noise. But retinal circuitry is diverse. The exact details are probably related to the ecological niche occupied by the organism. In mammals, the retinal output branches into two pathways, the collicular pathway and the geniculostriate pathway. The destination of the former is the midbrain region known as the superior colliculus. Collicular Visuomotor Transformations for Gaze Control charts the role of this brain region in controlling saccades, rapid eye movements that bring visual targets onto the fovea. Even this basic activity involves the cooperation of many brain regions; conversely, the function of the superior colliculus is not restricted to eye movements (“Visuomotor Coordination in Frog and Toad” charts the role of the tectum, which is homologous to the superior colliculus, in approach and avoidance behavior). By virtue of its topographical organization, the superior colliculus has become a key area for experimental and modeling approaches to the question of how sensory signals can be transformed into goal-directed movements. Moreover, the activity in the superior colliculus during saccades to auditory and somatosensory targets conforms to the same motor map, suggesting that considerable sensorimotor remapping must take place.
In mammals, the geniculostriate pathway travels from retina via a specialized region of thalamus called the lateral geniculate nucleus to the primary visual cortex, which is also called the striate cortex because of its somewhat striated appearance. The thalamus has many divisions, not only those involved with sensory pathways, but also those involved in loops linking the cortex to other brain regions like the cerebellum and basal ganglia. The thalamus is the gateway through which all sensory inputs, except olfaction, are relayed to the cortex. Thalamus shows that the thalamus can effectively control the flow of information to the cortex: during waking, it may subserve attention, selectively enhancing certain inputs to the cortex and attenuating others; during slow-wave sleep, the firing mode of thalamic cells changes, effectively closing the gateway and diminishing the influence of external stimuli on the cortex. Massive feedback from the cortex to the thalamus suggests that the entire thalamo-cortico-thalamic loop plays a role in sustaining and synchronizing cortical activity. Furthermore, certain thalamic nuclei appear to constitute an integral part of the signal flow between different cortical areas. The article reviews the anatomical and neurophysiological data and concludes with a brief discussion of models of the role of thalamus in thalamocortical interactions (see also “Adaptive Resonance Theory” and “Sleep Oscillations”), arguing that the organization of the projections to and from the thalamus is essential to understanding thalamic function.
Visual Cortex: Anatomical Structure and Models of Function reviews features of the microcircuitry of the primary visual cortex, area V1, and physiological properties of cells in its different laminae. It then outlines several hypotheses as to how the anatomical structure and connections might serve the functional organization of the region. For example, a connectionist model of layer IVc of V1 demonstrated that the gradient of change in properties of the layer could indeed be replicated using dendritic overlap through the lower two-thirds of the IVc layer. However, it was insufficient to explain the continuous and sharply increasing field size and contrast sensitivity observable near the top of the layer. However, this discrepancy led to new experiments and related changes in the model which resulted in a good replication of the actual physiological data and required only feedforward excitation. The article continues by analyzing the anatomical substrates for orientation specificity and for surround modulation of visual responses, and concludes by discussing the origins of patterned anatomical connections. Visual Scene Perception, Neurophysiology moves beyond V1 to chart the bifurcation of V1 output in monkeys and humans into a pathway that ascends to the parietal cortex (the dorsal “where/how” system involved in object location and setting of parameters for action) and a pathway that descends to inferotemporal cortex (the ventral “what” system involved in object recognition) (see also “Dissociations Between Visual Processing Modes”).
Somatosensory System argues that the tactile stimulus representation changes from an original form (more or less isomorphic to the stimulus itself) to a completely distributed form (underlying perception) in a series of partial transformations in successive subcortical and cortical networks. At the level of primary somatosensory cortex, the neural image of the stimulus is sensitive to shape and temporal features of peripheral stimuli, rather than simply reflecting the overall intensity of local stimulation. The processing of somatosensory information is seen as modular on two different scales: macrocolumnar in terms of “segregates” such as the cortical barrels seen in rodent somatosensory cortex, each of which receives its principal input from one of the facial whiskers; and minicolumnar, with each minicolumn in a segregate receiving afferent connections from a unique subset of the thalamic neurons projecting to that segregate. The article argues that the causal factors involved in body/object interactions are represented by the pyramidal cells of somatosensory cortical areas in such a way that their ascending, lateral, and feedback connections develop an internal working model of mechanical interactions of the body with the outside world. Such an internal model can endow the somatosensory cortex with powerful interpretive and predictive capabilities that are crucial for haptic perception (i.e., tactile perception of proximal surroundings) and for control of object manipulation.
The auditory system is introduced in two articles. Auditory Periphery and Cochlear Nucleus spells out how the auditory periphery transforms a very high information rate acoustic stimulus into a series of lower information rate auditory nerve firings, with the incoming acoustic information split across hundreds of nerve fibers to avoid loss of information. The transformation involves complex mechanical-to-electrical transformations. The cochlear nucleus continues this process of parallelization by creating multiple representations of the original acoustic stimulus, with each representation presumably emphasizing different acoustic features that are fed to other brainstem structures, such as the superior olivary complex, the nuclei of the lateral lemniscus, and the inferior colliculus. These parallel pathways are believed to be specialized for the processing of different auditory features used for sound source classification and localization. From the inferior colliculus, auditory information is passed via the medial geniculate body in the thalamus to the auditory cortex. Auditory Cortex stresses the crucial role that auditory cortex plays in the perception and localization of complex sounds. Although recent studies have expanded our knowledge of the neuroanatomical structure, the subdivisions, and the connectivities of all central auditory stages, relatively little is known about the functional organization of the central auditory system. Nevertheless, a few auditory tasks have been broadly accepted as vital for all mammals, such as sound localization, timbre recognition, and pitch perception. The article discusses a few of the functional and stimulus feature maps that have been found or postulated, and relates them to the more intuitive and better understood case of the echolocating bats (cf. “Echolocation: Cochleotopic and Computational Maps”).
The olfactory system is distinctive in that paths from periphery to cortex do not travel via a thalamic nucleus. The olfactory pathway begins with the olfactory receptor neurons in the nose, which project their axons to the olfactory bulb. The function of the olfactory bulb is to perform the initial stages of sensory processing of the olfactory signals before sending this information to the olfactory cortex. The study of the olfactory system offers prime examples of seeking a “basic circuit” that defines the irreducible minimum of neural components necessary for a model of the functions carried out by a region. Olfactory Bulb offers examples of information processing without impulses and of output functions of dendrites (dendrodendritic synapses). The olfactory cortex is defined as the region of the cerebral cortex that receives direct connections from the olfactory bulb and is subdivided into several areas that are distinct in terms of details of cell types, lamination, and sites of output to the rest of the brain. The main area involved in olfactory perception is the piriform (also called prepyriform) cortex, which projects to the mediodorsal thalamus, which in turn projects to the frontal neocortex. This is often regarded as the main olfactory cortex, and is the subject of the article Olfactory Cortex. Olfactory cortex is the earliest cortical region to differentiate in the evolution of the vertebrate forebrain and is the only region within the forebrain to receive direct sensory input. Models of olfactory cortex emphasize the importance of cortical dynamics, including the interactions of intrinsic excitatory and inhibitory circuits and the role of oscillatory potentials in the computations performed by the cortex.
We now introduce motor cortex, then turn to three systems related to motor control and to visuomotor coordination in mammals (cf. the road map Mammalian Motor Control): cortical areas involved in grasping, the basal ganglia, and cerebellum. Motor Cortex: Coding and Decoding of Directional Operations spells out the relation between the direction of reaching and changes in neuronal activity that have been established for several brain areas, including the motor cortex. The cells involved each have a broad tuning function, the peak of which is viewed as the “preferred” direction of the cell. A movement in a particular direction will engage a whole population of cells. It is found that the weighted vector sum of their neuronal preferences is a “population vector” which points in (close to) the direction of the movement for discrete movements in 2D and 3D space. Grasping Movements: Visuomotor Transformations shows the tight coupling between (specific subregions of) parietal and premotor cortex in controlling grasping. The AIP region of inferior parietal lobe appears to play a fundamental role in extracting intrinsic visual properties (“affordances”) from the object for organizing grasping movements. The extracted visual information is then sent to the F5 region of premotor cortex, there activating neurons that code grip types congruent to the size, shape, and orientation of the object. In addition to visually activated neurons in AIP, there are AIP cells whose activity is linked to motor activity, possibly reflecting corollary discharges sent by F5 back to the parietal cortex. (For the possible relation of grasping to language, and the homology between F5 and Broca’s area, see “Language Evolution: The Mirror System Hypothesis.”)
The basal ganglia include the striatum, the globus pallidus, the substantia nigra, and the subthalamic nucleus. Basal Ganglia stresses that all of these structures are functionally subdivided into skeletomotor, oculomotor, associative, and limbic territories. The basal ganglia can be viewed as a family of loops, each taking its origin from a particular set of functionally related cortical fields, passing through the functionally corresponding portions of the basal ganglia, and returning to parts of those same cortical fields by way of specific zones in the dorsal thalamus. The article reviews models of the basal ganglia that attempt to incorporate appropriate anatomical or physiological data, but not those that use only generic neural network architectures. Some models work at a comparatively low level of detail (membrane properties of individual neurons and microanatomical features) and restrict themselves to a single component of the basal ganglia nucleus; others work at the system level with the basal ganglia as a whole and with their interactions with related structures (e.g., thalamus and cortex). Since dopamine neurons discharge in relation to conditions involving the probability and imminence of behavioral reinforcement, dopamine neurons have been seen as playing a role in striatal information processing analogous to that of an “adaptive critic” in connectionist networks (cf. “Reinforcement Learning” and “Dopamine, Roles of”).
The division of function between cerebellum and basal ganglia remains controversial. One view is that the basal ganglia play a role in determining when to initiate one phase of movement or another, and that the cerebellum adjusts the metrics of movement, tuning different movements and coordinating them into a graceful whole. Cerebellum and Motor Control reviews a number of models for cerebellar mechanisms underlying the learning of motor skills. Cerebellum can be decomposed into cerebellar nuclei and a cerebellar cortex. The only output cells of the cerebellar cortex are the Purkinje cells, and their only effect is to provide varying levels of inhibition on the cerebellar nuclei. Each Purkinje cell receives two types of input—a single climbing fiber, and many tens of thousands of parallel fibers. The most influential model of cerebellar cortex has been the Marr-Albus model of the formation of associative memories between particular patterns on parallel fiber inputs and Purkinje cell outputs, with the climbing fiber acting as “training signal.” Later models place more emphasis on the relation between the cortex and nuclei, and on the way in which the subregions of this coupled cerebellar system can adapt and coordinate the activity of specific motor pattern generators. The plasticity of the cerebellum is approached from a different direction in Cerebellum and Conditioning. Many experiments indicate that the cerebellum is involved in learning and performance of classically conditioned reflexes. The article reviews a number of models of the role of cerebellum in rabbit eye blink conditioning, providing a useful complement to models of the role of cerebellum in motor control.
The hippocampus has been implicated in a variety of memory functions, both as working memory and as basis for long-term memory. It was also the site for the discovery of long-term potentiation (LTP) in synapses (see “Hebbian Synaptic Plasticity”). Structurally, hippocampus is the simplest form of cortex. It contains one projection cell type, whose cell bodies are confined to a single layer, and receives inputs from all sensory systems and association areas. Hippocampus: Spatial Models builds on the finding that single-unit recordings in freely moving rats have revealed “place cells” in subfields of the hippocampus whose firing is restricted to small portions of the rat’s environment (the corresponding “place fields”). These data underlie the seminal idea of the hippocampus as a spatial map (cf. “Cognitive Maps”). The article reviews the data and describes some models of hippocampal place cells and of their role in circuits controlling the rat’s navigation through its environment. Hippocampal Rhythm Generation provides data and models on theta and other rhythms as well as epileptic discharges, and also introduces the key cell types of the hippocampus and a number of interconnections between the hippocampus that seem to play a key role in the generation of these patterns of activity.
Finally, we turn to prefrontal cortex, the association cortex of the frontal lobes. It is one of the cortical regions to develop last and most in the course of both primate evolution and individual ontogeny. Prefrontal Cortex in Temporal Organization of Action suggests that the late morphological development of this cortex in both cases is related to its support of higher cognitive functions involving the capacity to execute novel and complex actions, which reaches its maximum in the adult human brain. The lateral region of prefrontal cortex is involved in the representation and temporal organization of sequential behavior. This article emphasizes the physiological functions of the lateral prefrontal cortex in the temporal organization of behavior. Temporal integration of sensory and motor information, through active short-term memory (working memory) and prospective set, supports the goal-directed performance of the perception-action cycle. This role extends to the temporal organization of higher cognitive operations, including, in the human, language and reasoning.
Cortical Memory
Covariance Structural Equation Modeling
EEG and MEG Analysis
Emotional Circuits
Event-Related Potentials
Hemispheric Interactions and Specialization
Imaging the Grammatical Brain
Imaging the Motor Brain
Imaging the Visual Brain
Imitation
Lesioned Networks as Models of Neuropsychological Deficits
Neurolinguistics
Neurological and Psychiatric Disorders
Neuropsychological Impairments
Prefrontal Cortex in Temporal Organization of Action
Sequence Learning
Statistical Parametric Mapping of Cortical Activity Patterns
Synthetic Functional Brain Mapping
Cognitive neuroscience has been boosted tremendously in the last decade by the rapid development and increasing use of techniques to image the active human brain. The road map thus starts with several articles on ways of observing activity in the human brain and then examines various human cognitive functions.
The organization of large masses of neurons into synchronized waves of activity lies at the basis of phenomena such as the electroencephalogram (EEG) and evoked potentials, as well as the magnetoencephalogram (MEG). The EEG consists of the electrical activity of relatively large neuronal populations that can be recorded from the scalp, while the MEG can be recorded using very sensitive transducers arranged around the head. EEG and MEG Analysis reviews methods of quantitative analysis that have been applied to extract information from these signals, providing an indispensable tool for sleep and epilepsy research. Epilepsy is a neurological disorder characterized by the occurrence of seizures, sudden changes in neuronal activity that interfere with the normal functioning of neuronal networks, resulting in disturbances of sensory or motor activity and of the flow of consciousness. During an epileptic seizure, the neuronal network exhibits typical oscillations that usually propagate throughout the brain, involving progressively more brain systems. These oscillations are revealed in the EEG (see also “Hippocampal Rhythm Generation”). In general, the same brain sources account for the EEG and the MEG, with the reservation that the MEG reflects magnetic fields perpendicular to the skull that are caused by tangential current dipolar fields, whereas the EEG/MEG reflects both radial and tangential fields. This property can be used advantageously to disentangle radial sources lying in the convexity of cortical gyri from tangential sources lying in the sulci.
Event-Related Potentials shows how cortical event-related potentials (ERPs) arise from synchronous interactions among large numbers of participating neurons. These include dense local interactions involving excitatory pyramidal neurons and inhibitory interneurons, as well as long-range interactions mediated by axonal pathways in the white matter. Depending on the types of interaction that occur in a specific behavioral condition, cortical networks may display different states of synchrony, causing their ERPs to oscillate in different frequency bands, designated delta (0–4 Hz), theta (5–8 Hz), alpha (9–12 Hz), beta (13–30 Hz), and gamma (31–100 Hz). Depending on the location and size of the recording and reference electrodes, recorded cortical field potentials integrate neural activity over a range of spatial scales: from the intracortical local field potential (LFP) to the intracranial electrocorticogram (ECoG) to the extracranial electroencephalogram (EEG). ERP studies have shown that local cortical area networks are able to synchronize and desynchronize their activity rapidly with changes in cognitive state. When incorporated into ANNs, the result could be a metastable large-scale neural network design that recruits and excludes subnetworks according to their ability to reach consensual local patterns, with the ability to implement behavioral schemas and adapt to changing environmental conditions.
Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) provide means for seeing which brain regions are significantly more active in one task rather than another. Functional neuroimaging is generally used to make inferences about functional anatomy on the basis of evoked patterns of cortical activity. Functional anatomy involves an understanding of what each part of the brain does, and how different brain systems interact to support various sensorimotor and cognitive functions. Large-scale organization can be inferred from techniques that image the hemodynamic and metabolic sequelae of evoked neuronal responses. PET measures regional cerebral blood flow (rCBF) and fMRI measures oxygenation changes. Their spatial resolution is on the order of a few millimeters. Because PET uses radiotracers, its temporal resolution is limited to a minute or more by the half-life of the tracers employed. However, fMRI is limited only by the biophysical time constants of hemodynamic responses themselves (a few seconds).
Statistical Parametric Mapping of Cortical Activity Patterns considers the neurobiological motivations for different designs and analyses of functional brain imaging studies, noting that the principles of functional specialization and integration serve as the motivation for most analyses. Statistical parametric mapping (SPM) is used to identify functionally specialized brain regions that respond selectively to experimental cognitive or sensorimotor changes, irrespective of changes elsewhere. SPM is a voxel-based approach (a voxel is a volume element of a 3D image) employing standard inferential statistics. SPM is a mass-univariate approach, in the sense that each data sequence, from every voxel, is treated as a univariate response. The massive numbers of voxels are analyzed in parallel, and dependencies among them are dealt with using random field theory (see “Markov Random Field Models in Image Processing”).
One approach to systems-level neural modeling aims at determining the network of brain regions mediating a specific cognitive task. This means finding the nodes of the network (i.e., the brain regions), and determining the task-dependent functional strengths of their interregional anatomical linkages. Covariance Structural Equation Modeling for Neurocognitive Networks describes techniques applied to the correlations between PET- or fMRI-determined regional brain activities. These correlations are viewed as “functional connectivities.” They thus vary from task to task, as different patterns of excitation and inhibition are routed through the anatomical connections of these regions. Examples of questions that can be answered using this approach are: (1) As one learns a task, do the functional links between specific brain regions change their values? (2) In cases of similar performance, are the same brain networks being used by normals and patients? The method is illustrated with studies of object and spatial vision showing cross-talk between the dorsal and ventral streams (see “Visual Scene Perception”), which implies that they need not be functionally independent. The article stresses the concept of a neural context, where the functional relevance of a particular region is determined by its interactions with other areas. Because the pattern of interactions with other connected areas differs from task to task, the resulting cognitive operations may vary within a single region as it engages in different tasks.
Synthetic Functional Brain Mapping analyzes ways in which models of neural networks grounded in primate neurophysiology can be used as the basis for predictions of the results of human brain imaging. This is crucial for furthering our understanding of the neural basis of behavior. Covariance structural equation modeling helps identify the nodes of the region-by-region network corresponding to a cognitive task, especially when there is little or no nonhuman data available (e.g., most language tasks). Synthetic functional brain mapping uses primate data to form hypotheses about the neural mechanisms whereby cognitive tasks are implemented in humans, with PET and fMRI data providing constraints on the possible ways in which these neural systems function. This is illustrated in relation to the mechanisms underlying saccadic eye movements and working memory.
The next three articles focus on what we are learning about vision, motor activity, and language from functional brain imaging. Imaging the Visual Brain addresses functional brain imaging of visual processes, with emphasis on limits in spatial and temporal resolution; constraints on subject participation; and trade-offs in experimental design. The articles focuses on retinotopy, visual motion perception, visual object representation, and voluntary modulation of attention and visual imagery, emphasizing some of the areas where modeling and brain theory might be testable using current imaging tools. Imaging the Motor Brain shows that the behavioral form and context of a movement are important determinants of functional activity within cortical motor areas and the cerebellum, stressing that functional imaging of the human motor system requires one to study the interaction of neurological and cognitive processes with the biomechanical characteristics of the effectors. Multiple neural systems must interact to successfully perform motor tasks, encode relevant information for motor learning, and update behavioral performance in real time. The article discusses how evidence from functional imaging studies provides insight into motor automaticity as well as the role of internal models in movement. The article also discusses novel mathematical techniques that extend the scope of functional imaging experimentation. Imaging the Grammatical Brain reviews brain imaging results that support the author’s view that linguistic rules are neurally real and form a constitutive element of the human language faculty. The focus is on linguistic combinations at the sentence level; but an analysis of cerebral representation of phonological units and of word meaning in its isolated and compositional aspects is provided as background. The study of brain mechanisms supporting language is further advanced in Neurolinguistics. Neurolinguistics began as the study of the language deficits occurring after brain injuries, and is rooted in the conceptual model of Broca’s aphasia, Wernicke’s aphasia, and other aphasic syndromes established over a hundred years ago. The article presents data and analyses for between-stage information flow, dynamics of within-stage processing, unitary representations and activation, and processing by constraint satisfaction. (For more background on these two articles, see the road map Linguistics and Speech Processing.)
Prefrontal Cortex in Temporal Organization of Action emphasizes the physiological functions of the lateral prefrontal cortex in the temporal organization of behavior, highlighting active short-term memory (working memory) and prospective set. The two cooperate toward temporally integrating sensory and motor information by mediating cross-temporal contingencies of behavior (see also “Competitive Queuing for Planning and Serial Performance”). This temporal integration supports the goal-directed performance of the perception-action cycle. It is a role that extends to the temporal organization of higher cognitive operations, including language and reasoning in humans. Cortical Memory stresses that some components of memory are localized in discrete domains of cortex, while others are more widely distributed. It outlines a view of network memory in the neocortex that is supported by empirical evidence from neuropsychology, behavioral neurophysiology, and neuroimaging. Its essential features are the acquisition of memory by the formation and expansion of networks of neocortical neurons through changes in synaptic transmission; and the hierarchical organization of memory networks, with a hierarchy of networks in posterior cortex for perceptual memory and another in frontal cortex for executive memory. Sequence Learning characterizes behavioral sequences in terms of their serial, temporal, and abstract structure, and analyzes the associated neural processing systems (see also “Temporal Pattern Processing”). Temporal structure is defined in terms of the durations of elements (and the possible pauses that separate them), and intuitively corresponds to the familiar notion of rhythm. Abstract structure is defined in terms of generative rules that describe relations between repeating elements within a sequence. Thus, the two sequences A-B-C-B-A-C and D-E-F-E-D-F are both generated from the same abstract structure 123-213. The article focuses on how the different dimensions of sequence structure can be encoded in neural systems, citing behavioral studies in different patient and control groups and related simulation studies. A recurrent network for manipulating abstract structural relations is implemented in a distributed network that potentially includes the perisylvian cortex in and around Broca’s area. It is argued that both transfer of sequence knowledge between domains and abstract rule representation are likely to be neurophysiological realities.
Complementing sequence learning is the study of imitation, the ability to recognize and reproduce others’ actions. Imitation is also related to fundamental capabilities for social cognition such as the recognition of conspecifics, the attribution of others’ intentions, and the ability to deceive and to manipulate others’ states of mind. Imitation bridges between biology and engineering, reviewing the cognitive and neural processes behind the different forms of imitation seen in animals and showing how studies of biological processes influence the design of robot controllers and computational algorithms. Theoretical models have been proposed to, e.g., distinguish between purely associative imitation (low-level) and sequential imitation (high-level). It is argued that modeling of imitation will lead to a better understanding of the neural mechanisms at the basis of social cognition and will offer new perspectives on the evolution of animal abilities for social representation (see “Language Evolution: The Mirror System Hypothesis” for more on evolution and imitation).
Emotional Circuits stresses the distinction between emotional experiences and the underlying processes that lead to emotional experiences. (See also “Motivation” for a discussion of the motivated or goal-directed behaviors that are often accompanied by emotion or affect.) The article is grounded in studies of how the brain detects and evaluates emotional stimuli and how, on the basis of such evaluations, appropriate responses are produced, treating emotion as a function that allows the organism to respond in an adaptive manner to challenges in the environment rather than being inextricably compounded with the subjective experience of emotion. The amygdala is shown to play a major role in the evaluation process. It is argued that fearful stimuli follow two main routes. The fast route involves the thalamo-amygdala pathway and responds best to simple stimulus features, while the slow route involves the thalamo-cortical-amygdala pathway and carries more complex features (such as context). The expression of fear is mediated by the outputs of the amygdala to brainstem and hypothalamus, while the experience of fear involves the prefrontal cortex.
One cerebral hemisphere may perform better than the other for such diverse tasks as language, handedness, visuospatial processing, emotion and its facial expression, olfaction, and attention. Behavioral lateralization has not only been demonstrated in people, but also in rodents, birds, primates, and other animals in areas such as vocalization and motor preferences. Many anatomical, biochemical, and physiological asymmetries exist in the brain, but it is generally unclear which, if any, of these asymmetries actually contribute to hemispheric specialization. Pathways such as the corpus callosum connecting the hemispheres appear to mediate both excitatory and longer-term inhibitory interactions between the hemispheres. Hemispheric Interactions and Specialization first considers models of hemispheric interactions that do not incorporate hemispheric differences, and conversely, models examining the effects of hemispheric differences that do not incorporate hemispheric interactions. It then looks in more detail at recent studies demonstrating how both hemispheric interactions and differences influence the emergence of lateralization in models where lateralization is not initially present.
As we already saw in, e.g., Neurolinguistics, cognitive neuropsychology uses neurological data on the performance of brain-damaged patients to constrain models of normal cognitive function. Lesioned Networks as Models of Neuropsychological Deficits surveys how connectionist techniques have been employed to model the operation and interaction of “modules” inferred from the neurological data. The advantage over “box-and-arrow” models is that removing neurons or connections in connectionist models leads to natural analogues of real brain damage. Moreover, such models let one explore the possibility that processing is actually more distributed and interactive than the older models implied. The article discusses the effects of simulated lesioning on various models, constructed either as feedforward networks or as attractor networks, paying special attention to the misleading artifacts that may arise when large brains are modeled by small ANNs. Continuing with this theme, Neuropsychological Impairments cautions that the inferences that link a neuropsychological impairment to a particular theory in cognitive neuroscience are not as direct as one might at first assume. The brain is a distributed and highly interactive system, such that local damage to one part can unleash new modes of functioning in the remaining parts of the system. The article emphasizes neural network models of cognition and the brain that provide a framework for reasoning about the effects of local lesions in distributed, interactive systems. In many cases a model’s behavior after lesioning is somewhat counterintuitive and so can lead to very different interpretations regarding the nature of the normal system. A model of neglect dyslexia shows how an impairment in a prelexical attentional process could nevertheless show a lexicality effect. Prosopagnosia is an impairment of face recognition that can occur relatively independently of impairments in object recognition. The behavior of some prosopagnosic patients seems to suggest that that recognition and awareness depend on dissociable and distinct brain systems. However, a model of covert face recognition demonstrates how dissociation may occur without separate systems. Neurological and Psychiatric Disorders shows how neural modeling may be harnessed to investigate the pathogenesis and potential treatment of brain disorders by studying the relation between the “microscopic” pathological alterations of the underlying neural networks and the “macroscopic” functional and behavioral disease manifestations that characterize the network’s function. The article reviews computational studies of the neurological disorders of Alzheimer’s disease, Parkinson’s disease, and stroke, and the psychiatric disorders of schizophrenia and affective disorders.
Analogy-Based Reasoning and Metaphor
Associative Networks
Cognitive Development
Cognitive Maps
Cognitive Modeling: Psychology and Connectionism
Compositionality in Neural Systems
Concept Learning
Conditioning
Consciousness, Neural Models of
Developmental Disorders
Embodied Cognition
Emotional Circuits
Face Recognition: Psychology and Connectionism
Motivation
Philosophical Issues in Brain Theory and Connectionism
Schema Theory
Systematicity of Generalizations in Connectionist Networks
Much classical psychology was grounded in notions of association—of ideas, or of stimulus and response—which were well developed in the philosophy of Hume, but with roots going back as far as Aristotle. Associative Networks shows how these old ideas gain new power because neural networks can provide mechanisms for the formation of associations that automatically yield many further properties. One of these is that neural networks will in many cases have similar responses to similar inputs, a property that is exploited in the study of Analogy-Based Reasoning and Metaphor. Analogy and metaphor have been characterized as comparison processes that permit one domain to be seen in terms of another. Indeed, many of the advantages suggested for connectionist models—representation completion, similarity-based generalization, graceful degradation, and learning—also apply to analogy, yet analogical processing poses significant challenges for connectionist models. Analogy and metaphor involve structured pattern matching, structured pattern completion, and a focus on common relational structure rather than on common object descriptions. The article analyzes current connectionist models of analogy and metaphor in terms of representations and associated processes, not in terms of brain function. Challenges for future research include building analogical models that can preserve structural relations over incrementally extended analogies and that can be used as components of a broader cognitive system such as one that would perform problem solving. Indeed, people continually deal with composite structures whether they result from aggregation of symbols in a natural language into syllables, words, and sentences or aggregation of visual features into contour and regions, objects, and complete scenes. Compositionality in Neural Systems addresses the question of what sort of neural dynamics allows composite structures to emerge, with the grouping and binding of parts into interpretable wholes. To this day it is still disputed whether ANNs are capable of adequately handling compositional data, and if so, which type of network is most suitable. Basic results have been obtained with simple recurrent networks, but some researchers argue that more complicated dynamics (see, e.g., “Synchronization, Binding and Expectancy”) or dynamics similar to classical symbolic processing mechanisms are necessary for successful modeling of compositionality. In a related vein, Systematicity of Generalizations in Connectionist Networks presents the current “state of play” for Fodor and Pylyshyn’s critique of connectionist architecture. They claimed that human cognitive abilities “come in clumps” (i.e., the abilities are systematically related), and that this systematic relationship does not hold in connectionist networks. The present article examines claims and counterclaims concerning the idea that learning in connectionist architectures can engender systematicity, with special attention paid to studies based on simple recurrent networks (SRNs) and recursive auto-associative memory (RAAM). The conclusion is that, for now, evidence for systematicity in such simple networks is rather limited. (One may ponder the fact that animal brains are vastly more complex than a single SRN or RAAM.)
The “units of thought” afforded by connectionist “neurons” are quite high level compared to the fine-grain computation of the myriad neurons in the human brain, and their properties may hence be closer to those of entire neural networks than to single biological neurons. Moreover, future connectionist accounts of cognition will certainly involve the coordination of connectionist modules (see, e.g., “Hybrid Connectionist/Symbolic Systems”). Schema Theory complements neuroscience’s well-established terminology for levels of structural analysis (brain region, neuron, synapse) with a functional vocabulary, a framework for analysis of behavior with no necessary commitment to hypotheses on the localization of each schema (unit of functional analysis), but which can be linked to a structural analysis whenever appropriate. The article focuses on two issues: structuring perceptual and motor schemas to provide an action-oriented account of behavior and cognition (as relevant to the roboticist as the ethologist), and how schemas describing animal behavior may be mapped to interacting regions of the brain. Schema-based modeling becomes part of neuroscience when constrained by data provided by, e.g., human brain mapping, studies of the effects of brain lesions, or neurophysiology. The resulting model may constitute an adequate explanation in itself or may provide the framework for modeling at the level of neural networks or below. Such a neural schema theory provides a functional/structural decomposition, in strong contrast to models that employ learning rules to train a single neural network to respond as specified by some training set.
Connectionism can apply many different types of ANN techniques to explain psychological phenomena, and the article Cognitive Modeling: Psychology and Connectionism places a sample of these in perspective. The general idea is that much of psychology is better understood in terms of parallel networks of adaptive units than in terms of serial symbol processing, and that connectionism gains much of its power from using very simple units with explicit learning rules. The article points out that connectionist models of cognition can be used both to model cognitive processes and to simulate the performance of tasks and that, unlike many traditional computational models, they are not explicitly programmed by the investigator. However, important aspects of the performance of a connectionist net are controlled by the researcher, so that the achievement of a good fit to the psychological data depends both on the way in which analogs to the data are derived and on the results of “extensional programming,” such as decisions about the selection and presentation of training data. The article also notes the work of “cognitive connectionists,” whose computational experiments have demonstrated the ability of connectionist representations to provide a promisingly different account of important characteristics of cognition (compositionality and systematicity), previously assumed to be the exclusive province of the classical symbolic tradition. Philosophical Issues in Brain Theory and Connectionism asks the following questions: (1) Do neural systems exploit classical compositional and systematic representations, distributed representations, or no representations at all? (2) How do results emerging from neuroscience help constrain cognitive scientific models? (3) In what ways might embodiment, action, and dynamics matter for understanding the mind and the brain? There is a growing emphasis on the computational economies afforded by real-world action and the way larger structures (of agents and artifacts) both scaffold and transform the shape of individual reason. However, rather than seeing representations as opposed to interactive dynamics, the article advocates a broader vision of the inner representational resources themselves, stressing the benefits of converging influences from robotics, systems-level neuroscience, cognitive psychology, evolutionary theory, AI, and philosophical analysis. This philosophical theme is further developed in Consciousness, Neural Models of, which reviews the basic ways in which consciousness has been defined, relevant neuropsychological data, and preliminary progress in neural modeling. Among the characteristics needed for consciousness are temporal duration, attentional focus, binding, bodily inputs, salience, past experience, and inner perspective. Brain imaging, as well as insights into single-cell activity and the effects of brain deficits, is leading to a clearer picture of the neural correlates of consciousness. The article presents a specific attention control model of the emergence of awareness in which experience of the prereflective self is identified with the corollary discharge of the attention movement control signal. This signal is posited to reside briefly in its buffer until the arrival of the associated attended input activation at its own buffer. The article concludes by reviewing other neural models of consciousness.
Much of the early work on ANNs was inspired by the problem of “Pattern Recognition” (q.v.). Concept Learning provides a general introduction to recent work, placing such ideas in a psychological perspective. Concepts are mental representations of kinds of objects, events, or ideas. The article focuses on learning mental representations of new concepts from experience and how mental representations of concepts are used to make categorization decisions and other kinds of judgments. The article reviews five types of concept learning models: rule models, prototype models, exemplar models, mixed models, and neuroscientific models. The mechanisms discussed briefly here are developed at greater length in many articles in the road map Learning in Artificial Networks. The psychology of concept learning receives special application in the study of Face Recognition: Psychology and Connectionism, which relates connectionist approaches to face recognition to psychological theories for the subtasks of representing faces and retrieving them from memory, comparing human and model performance along these dimensions.
Many of the concepts of connectionist psychology are strongly related to work in behaviorism, but neural networks provide a stronger “internal structure” than stimulus-response probabilities. Connectionist research has enriched a number of concepts that seemed “anticognitive” by embedding them in mechanisms, namely, neural nets, which can both support internal states and yield stimulus-response pairs as part of a general input-output map. This is shown in Conditioning. During conditioning, animals modify their behavior as a consequence of their experience of the contingencies between environmental events. This article presents formal theories and neural network models that have been proposed to describe classical and operant conditioning. During classical conditioning, animals change their behavior as a result of the contingencies between the conditioned stimulus (CS) and the unconditioned stimulus (US). Contingencies may vary from very simple to extremely complex ones. For example, in Pavlov’s proverbial experiment, dogs were exposed to the sound of a bell (CS) followed by food (US). At the beginning of training, animals salivated (generated an unconditioned response, UR) only when the US was presented. With an increasing number of CS-US pairings, CS presentations elicited a conditioned response (CR). The article discusses variations in the effectiveness of the CS, the US, and the CS and US together, as well as attentional models. During operant (or instrumental) conditioning, animals change their behavior as a result of a triple contingency between its responses (R), discriminative stimuli (SD), and the reinforcer (US). Animals are exposed to the US in a relatively close temporal relationship with the SD and R. As in “Reinforcement Learning” (q.v.), during operant conditioning animals learn by trial and error from feedback that evaluates their behavior but does not indicate the correct behavior. The article discusses positive reinforcement and negative reinforcement. Such ideas are further developed in Cognitive Maps. Tolman introduced the notion of a cognitive map to explain animals’ capacity for place learning, latent learning, detours, and shortcuts. In some models, Tolman’s vicarious trial-and-error behavior has been regarded as reflecting the animal’s comparison of different expectancies: at choice points, animals make a decision after sampling the intensity of the activation elicited by the various alternative paths. Other models still use Tolman’s stimulus-approach view and assume that animals approach the place with the strongest appetitive activation, thereby performing a gradient ascent toward the goal. In addition to storing the representation of the environment in the terms of the contiguity between places, cognitive maps can store information about differences in height and the type of terrain between adjacent places, contain a priori knowledge of the space to be explored, distinguish between roads taken and those not taken, and keep track of which places have been examined. Neural networks with more than two layers can also be used to represent both the contiguity between places and the relative position of those places. Hierarchical cognitive maps can represent the environment at multiple levels. In contrast to their nonhierarchical counterparts, hierarchical maps can plan navigation in large environment, use a smaller number of connections in their networks, and have shorter decision times.
Learning in neural nets can be either supervised or unsupervised, and supervision can be in terms of a specific error signal or some general reinforcement. However, in real animals, these signals seem to have some “heat” to them, which brings us to the issues of motivation and emotion. Motivated or goal-directed behaviors are sets of motor actions that direct an animal toward a particular goal object. Interaction with the goal either promotes the survival of an individual or maintains the species. Motivated behaviors include sleep/wake, ingestive, reproductive, thermoregulatory, and aggressive/defensive behaviors (see also “Pain Networks”). They are often accompanied by emotion or affect. Given the difficulty of defining the terms drive, instinct, and motivation with respect to the neural substrates of behavior, Motivation adopts a neural systems approach that discusses what and how particular parts of the brain contribute to the expression of behaviors that have a motivated character. The approach is based on Hullian incentive models of motivation, where the probability of a particular behavior depends on the integration of information from systems that control circadian timing and regulate arousal state, inputs derived from interosensory information that encode internal state (e.g., hydration state, plasma glucose, leptin, etc.), modulatory hormonal inputs such as gonadal steroids that mediate sexual behavior, and inputs derived from classic sensory modalities. Emotional Circuits analyzes the nature of emotion, emphasizing its role in behavior rather than the subjective feelings that accompany human emotions, then examines the role of brain structures such as the amygdala, the interaction of body and cognitive states, and the status of neural modeling. The expression of fear is seen as mediated by the outputs of the amygdala to lower brain centers (brainstem, hypothalamus), while the experience of fear involves the prefrontal cortex.
Finally, we turn to development, a theme of special concern in connectionist linguistics (see the next road map). Cognitive Development reviews connectionist models of the origins of knowledge, the mechanisms of change, and the task-dependent nature of developing knowledge across a variety of domains. In each case, the models provided explicit instantiations and controlled tests of specific theories of development, and allowed the exploration of complex, emergent phenomena. However, most connectionist models are “fed” their input patterns regardless of what they output, whereas even very young children shape their environments based on how they behave. Moreover, most connectionist models are designed for and tested on a single task within a single domain, whereas children face a multitude of tasks across a range of domains each day. Capturing such features of development will require future models to take in a variety of types of information and learn how to perform successfully across a number of tasks. Developmental Disorders uses the comparison of different abnormal phenotypes to explore further the modeling of the developing mind/brain. The article reviews recent examples of connectionist models of developmental disorders. Autism is a developmental disorder characterized primarily by deficits in social interaction, communication, and imagination, but also by a range of secondary deficits. One hypothesis suggests that these structural deficits are consistent with too few neurons in some brain areas, such as the cerebellum, and too many neurons in other areas, such as the amygdala and hippocampus. This grounds a simple connectionist model trained on categorization tasks linking such differences in neurocomputational constraints to some of the secondary deficits found in autism. Other models relate disordered feature maps or hidden unit numbers to higher-level cognitive deficits that characterize autism. Developmental dyslexia has been modeled by changing parameters in models of the normal processes of reading. Another model captures some features of specific language impairment, specifically the difficulty of affected patients in learning rule-based inflectional morphology in verbs, using an attractor network mapping between semantic codes and phonological codes. The article also reports new empirical findings on Williams syndrome patients which reveal a deficit in generalizing knowledge of inflectional patterns to novel forms. Alterations in the initial computational constraints of a connectionist model of past tense development are shown to account for some of the patterns seen in such data, demonstrating how different computational constraints interact in the process of development. Connectionist models thus provide a powerful tool with which to investigate the role of initial computational constraints in determining the trajectory of both typical and atypical development, ensuring that selective deficits in developmental disorders are seen in terms of the outcome of the developmental process itself.
Constituency and Recursion in Language
Convolutional Networks for Images, Speech, and Time Series
Hidden Markov Models
Imaging the Grammatical Brain
Language Acquisition
Language Evolution and Change
Language Evolution: The Mirror System Hypothesis
Language Processing
Motor Theories of Perception
Neurolinguistics
Optimality Theory in Linguistics
Past Tense Learning
Reading
Speech Processing: Psycholinguistics
Speech Production
Speech Recognition Technology
The traditional grounding of linguistics is in grammar, a systematic set of rules for structuring the sentences of a particular language. Much modern work in linguistics has been dominated by the ideas of Noam Chomsky, who placed the notion of grammar in a mathematical framework. His ideas have gone through successive stages in which the formulation of grammars has changed radically. However, two themes have remained stable in the “generative linguistics” that has grown from his work:
There is a universal grammar which defines what makes a language human, and each human language has a grammar that is simply a parametric variation of the universal grammar.
Language is too complicated for a child to learn from scratch; instead a child has universal grammar as an innate mental capacity. When the child hears example sentences of a language, they set parameters in the universal grammar so that the child can then acquire the grammar of the particular language.
Connectionist linguistics attacks this reasoning on two fronts:
It says that language processing is better understood in terms of connectionist processing, which, as a performance model (i.e., a model of behavior, as distinct from a competence model, which gives a static representation of a body of knowledge), can give an account of errors as well as regularities in language use.
It notes that connectionism has powerful learning tools that Chomsky has chosen to ignore. With those tools, connectionism can model how children could acquire language on the basis of far less specific mental structures than those posited in universal grammar.
Language Processing reviews many application of connectionist modeling. Despite the insights gained into syntactic structure across languages, the formal study of language has revealed relatively little about learning and development. Thus, as we shall see later in this road map, the connectionist program for understanding language has concentrated on the process of change, exploring topics such as language development, language breakdown, the dynamics of representation in complex systems which themselves may be receiving changing input, and even the evolution of language. The article briefly reviews models of lexical processing (reading single words, recognizing spoken words, and word production) as well as higher-level processing. It concludes that there has been important progress in many areas of connectionist-based research into language processing, and this modeling influences both psychological and neuropsychological experimentation and observation. However, it concedes that the major debates on top-down feedback, on the capacity of connectionist models to capture the productivity and systematicity of human language, and on the degree of modularity in language processing remain to be settled.
Constituency and Recursion in Language then provides more detail on connectionist approaches to syntax. Words group together to form coherent building blocks, constituents, within a sentence, so that “The girl liked a boy” decomposes into “the girl” and “liked a boy,” forming a subject noun phrase (NP) and a verb phrase (VP), respectively. In linguistics, grammar rules such as Sentence S → NP VP determine how constituents can be put together to form sentences. To capture the full generativity of human language, recursion needs to be introduced into the grammar. For example, if we add the rules NP → (det) N(PP) (noun with optional determiner and prepositional phrase) and PP → Preposition NP, then the rules are recursive, because in this case, NP can invoke rules that eventually call for another instance of NP. This article discusses how constituency and recursion may fit into a connectionist framework, and the possible implications this may have for linguistics and psycholinguistics.
Language Acquisition presents models used by developmental connectionists to support the claim that rich linguistic representations can emerge from the interaction of a relatively simple learning device and a structured linguistic environment. The article reviews connectionist models of lexical development, inflectional morphology, and syntax acquisition, stressing that these models use similar learning algorithms to solve diverse linguistic problems. Past Tense Learning then presents issues in word morphology as a backdrop for a detailed discussion of the prime debate between a rule-based and a connectionist account of language processing, over the forming of regular and irregular past tenses of verbs in English. The dual mechanism model—use the general rule “add-ed” unless an irregular past tense is found in a table of exceptions—was opposed by the view that all past tenses, even for regular verbs, are formed by a connectionist network. The article concludes that most researchers now agree that the mental processing of irregular inflections is not rule governed but rather works much like a connectionist network. Certainly, rules provide an intuitively appealing explanation for regular behavior. Indeed, people are clearly able to consciously identify regularities and describe them with explicit rules that can then be deliberately followed, but this does not imply that a neural encoding of these rules, rather than a connectionist network which yields rule-like behavior, is the better account of “mental reality.” The matter is subtle because the brain is composed of neurons. Thus the issue is not “Does the brain’s language processing use neural networks?” but whether or not the activity of those networks is best described as explicitly encoding a set of rules.
Reading covers connectionist models of reading and associated processes, including the reading disorder known as dyslexia. Where a skilled reader can recognize many thousands of printed words, each in a fraction of a second, with no noticeable effort, a dyslexic child may need great effort to recognize a printed word as a particular word. Most connectionist networks for reading are models of word recognition. However, word recognition is more than an analytic letter-by-letter process that translates spelling into phonology, and so the synthetic-analytic debate provides the organizing theme for this article. The authors argue that, rather than see modeling word recognition as a distinct, separable component of reading, it may be better to investigate more integrative, nonlinear iterative network models. However, Speech Processing: Psycholinguistics reviews attempts to capture psycholinguistic data using connectionist models, with the primary focus on speech segmentation and word recognition. This article analyzes how far the problem of segmenting speech into words occurs independently of word recognition; considers the interplay of connectionist models of word recognition with empirical research and theory; and assesses the gap that remains between psycholinguistic studies of speech processing and modeling of the human brain. Although data from neuropsychology and functional imaging are becoming increasingly important (see Imaging the Grammatical Brain and Neurolinguistics), the main empirical constraints on psycholinguistic models are derived from laboratory studies of human language processing that are unrelated to neural data. The article suggests that connectionist modeling helps bridge the gulf between psycholinguistics and neuroscience by employing computational models that embody at least some of the computational principles of the brain.
Imaging the Grammatical Brain notes that there is little agreement on the best way to analyze language. Contrary to the connectionist approach (see, e.g., Past Tense Learning), the author sees inventories of combinatorial rules, and stores of complex objects of several types over which these rules operate, as being at the core of language. The “language faculty,” in this view, inheres in a cerebrally represented knowledge base (rule system) and in algorithms that instantiate it. It is divided into levels for the identification and segmentation of speech sounds (universal phonetics), a system that enables the concatenation of phonetic units into sequences (phonology), then into words (morphology, where word structure is computed), sentences (syntax), and meaning (lexical and compositional semantics). The article reviews results emanating from brain imaging that support the neural reality of linguistic rules as a constitutive element of the human language faculty. The focus is on linguistic combinations at the sentence level, but an analysis of cerebral representation of phonological units and of word meaning in its isolated and compositional aspects is provided as background. The study of brain mechanisms supporting language is further advanced in Neurolinguistics. Neurolinguistics began as the study of the language deficits occurring after brain injuries and is rooted in the conceptual model of Broca’s aphasia, Wernicke’s aphasia, and other aphasic syndromes established over a hundred years ago. However, thanks to recent research, critical details are now seen differently, and finer details have been added. Speech and language are now recognized as the products of interacting dynamic systems, with major implications for modeling normal and abnormal performance and for understanding their neural substrates. The article analyzes between-stage information flow, dynamics of within-stage processing, unitary representations and activation, and processing by constraint satisfaction. How the cognitive elements (nodes) of psychological theorizing correspond to actual neuronal activity is not known for certain. However, the article suggests that the attractor states that can occur in recurrent networks are viable candidates for behaving as nodes. Indeed, many modeling efforts in neurolinguistics have been concerned with the consequences of relatively large-scale assumptions about stages and connections (see “Lesioned Networks as Models of Neuropsychological Deficits”).
On the output side, Speech Production focuses on work in motor control, dynamical systems and neural networks, and linguistics that is critical to understanding the functional architecture and characteristics of the speech production system. The central point is that spoken word forms are not unstructured wholes but rather are composed from a limited inventory of phonological units that have no independent meaning but that can be (relatively freely) combined and organized in the construction of word forms. The production of speech by the lips, tongue, vocal folds, velum, and respiratory system can thus be understood as arising from choreographed linguistic action units. However, when phonological units are made manifest in word and sentence production, their spatiotemporal realization by the articulatory system, and consequent acoustic character presented to the auditory system, is highly variable and context dependent. The speech production system is sometimes viewed as having two components, one (traditionally referred to as phonology) concerned with categorical and linguistically contrastive information, and the other concerned with gradient, noncontrastive information (traditionally referred to as phonetics). However, current work in connectionist and dynamical systems models blurs this dichotomy. Motor Theories of Perception reviews reasons why speech scientists have doubted the claim that the speech motor system participates in speech perception and then argues against such doubts, showing that the theory accrues credibility when it is set in the larger context of investigations of perception, action, and their coupling. The mirror neurons in primates (see Language Evolution: The Mirror System Hypothesis) are seen as providing an existence proof of neuronal perceptuomotor couplings. The article further argues that, although the motor theory of speech perception was motivated by requirements of speaking and listening, real-world functional perception-action coupling is central to the “design” of animals more generally.
We have already contrasted connectionism with rule-based frameworks that account for linguistic patterns through the sequential application of transformations to lexical entries. Optimality Theory in Linguistics introduces optimality theory (OT) as a framework for linguistic analysis that has largely supplanted rule-based frameworks within phonology; it has also been applied to syntax and semantics, though not as widely. Generation of utterances in OT involves two functions, Gen and Eval. Gen takes an input and returns a (possibly infinite) set of output candidates. Some candidates might be identical to the input, others modified somewhat, others unrecognizable. Eval chooses the candidate that best satisfies a set of ranked constraints; this optimal candidate becomes the output. The constraints can conflict, so the constraints’ ranking, which differs from language to language, determines the outcome. One language might eliminate consonant clusters by deleting consonants; another might retain all input consonants. OT was partly inspired by neural networks, employing as it does the ideas of optimization, parallel evaluation, competition, and soft, conflicting constraints. OT can be implemented in a neural network with constraints that are implemented as connection weights. The network implements a Lyapunov function that maximizes “harmony” ( : the sum, for all pairs i, j of neurons, of the product of the neurons’ activations and their connection weight). Hierarchically structured representations (e.g., consonants and vowels grouped into syllables) can be represented as matrices of neurons, where each matrix is the tensor product of a vector for a linguistic unit and a vector for its position in the hierarchy.
An approach to language that emphasizes the learning processes of each new speaker rather than the existence of a set of immutable rules shared by all humans seems well equipped to approach the issue of how a language changes from generation to generation. Computational modeling has been used to test competing theories about specific aspects of language evolution under controlled circumstances. Connectionist networks have been used as simulated agents to study how social transmission via learning may give rise to the evolution of structured communication systems. In other cases, properties of neural network learning are enlisted to help illuminate the constraints and processes that may have been involved in the evolution of language. Language Evolution and Change surveys this connectionist research, starting from the emergence of early syntax and continuing on to the role of social interaction and constraints on network learning in subsequent evolution of language. It also discusses linguistic change within existing languages, showing how the inherent generalization ability of neural networks makes certain errors in language transmission from one generation to the next more likely than others. (However, such models say more about the simplification of grammars than about how language complexity arises in the first place.) Where this article stresses computational efficacy of various models proposed for the emergence of features characteristic of current human languages, Language Evolution: The Mirror System Hypothesis focuses on brain mechanisms shared by humans with other primates, and seeks to explain how these generic mechanisms might have become specialized during hominid evolution to support language. It is argued that imitation and pantomime provide a crucial bridging capability between general primate capabilities for action recognition and the language readiness of the human brain.
At present, the state of play may be summarized as follows: generative linguistics has shown how to provide grammatical rules that explain many subtle sentence constructions of English and many other languages, revealing commonalities and differences between languages, with the differences in some cases being reduced to very elegant and compact formulations in terms of general rules with parametric variations. However, in offering the notion of universal grammar as the substrate for language acquisition, generative linguistics ignores issues of learning that must, in any case, be faced in explaining how children acquire the large and idiosyncratic vocabulary of their native tongue. Connectionist linguistics, on the other hand, has made great strides in bringing learning to the center, not only showing how specific language skills (e.g., use of the past tense) may be acquired, but also providing insight into psycholinguistics, the study of language behavior. However, connectionist linguistics still faces two major hurdles: it lacks the systematic overview of language provided by generative linguistics, and little progress has been made in developing a neurolinguistic theory of the contributions of specific brain regions to language capabilities. It is one thing to train an ANN to yield a convincing model of performance on the past tense; it is quite another to offer an account of how this skill interfaces with all the other aspects of language, and what neural substrates are necessary for their acquisition by the human child.
The remaining articles look at speech processing from a technological perspective rather than in relation to human psycholinguistic data. Speech Recognition Technology introduces the way computer systems that transcribe speech waveforms into words rely on digital signal processing and statistical modeling methods to analyze and model the speech signal. Although commercial technology is typically not based on connectionist methods, neural network processing is commonly seen as a promising alternative to some of the current algorithms, and the article focuses on speech recognizers that process large-vocabulary continuous speech and that use multilayer feedforward neural networks. Traditional speech recognition systems follow a hierarchical architecture. A grammar specifies the sentences allowed by the application. (Alternatively, for very large vocabulary systems, a statistical language model may be used to define the probabilities of various word sequences in the domain of application.) Each word allowed by the grammar is listed in a dictionary that specifies its possible pronunciations in terms of sequences of phonemes which are further decomposed into smaller units whose acoustic realizations are represented by statistical acoustic models. When a speech waveform is input to a recognizer, it is first processed by a front-end unit that extracts a sequence of observations, or “features,” from the raw signal. This sequence of observations is then decoded into the sequence of speech units whose acoustic models best fit the observations and that respect the constraints imposed by the dictionary and language model. Hidden Markov models (HMMs) have been an essential part of the toolkit for continuous speech recognition, as well as other complex temporal pattern recognition problems such as cursive (handwritten) text recognition, time-series prediction, and biological sequence analysis. Hidden Markov Models describes the use of deterministic and stochastic finite state automata for sequence processing, with special attention to HMMs as tools for the processing of complex piecewise stationary sequences. It also describes a few applications of ANNs to further improve these methods. HMMs allow complex learning problems to be solved by assuming that the sequential pattern can be decomposed into piecewise stationary segments, with each stationary segment parameterized in terms of a stochastic function. The HMM is called “hidden” because there is an underlying stochastic process (i.e., the sequence of states) that is not directly observable but that nonetheless affects the observed sequence of events. Convolutional Networks for Images, Speech, and Time Series shows how shift invariance is obtained in convolutional networks by forcing the replication of weight configurations across space. This takes the topology of the input into account, enabling such networks to force the extraction of local features by restricting the receptive fields of hidden units to be local, and enforcing a built-in invariance with respect to translations, or local distortions of the inputs.
Artificial Intelligence and Neural Networks
Bayesian Networks
Competitive Queuing for Planning and Serial Performance
Compositionality in Neural Systems
Connectionist and Symbolic Representations
Decision Support Systems and Expert Systems
Dynamic Link Architecture
Graphical Models: Parameter Learning
Graphical Models: Probabilistic Inference
Graphical Models: Structure Learning
Hybrid Connectionist/Symbolic Systems
Memory-Based Reasoning
Multiagent Systems
Schema Theory
Semantic Networks
Structured Connectionist Models
Systematicity of Generalizations in Connectionist Networks
In the 1950s, the precursors of today’s fields of artificial intelligence and neural networks were still subsumed under the general heading of cybernetics. Much of the work in the 1960s sought to distance artificial intelligence (AI) from its cybernetic roots, emphasizing models of, e.g., logical inference, game playing, and problem solving that were based on explicit symbolic representations manipulated by serial computer programs. However, work in computer vision and in robotics (discussed in the road maps Vision and Robotics and Control Theory, respectively) showed that this distinction was never entirely convincing, since these were areas of AI that made use of parallel computation and numerical transformations. For a while, a case could be made that the use of parallelism might be appropriate for peripheral sensing and motor control but not for the “central” processes involved in “real” intelligence. However, work from at least the mid-1970s onward has made this fallback position untenable. For example, in the HEARSAY system, speech understanding was achieved not by serial manipulation of symbolic structures but by the action (implicitly distributed, though in the 1970s still implemented on a serial computer) of knowledge sources (what we would now call “agents”) to update numerical confidence levels of multiple hypotheses distributed across a set of “levels” in a data structure known as a blackboard. Multiagent Systems introduces the methodology that has grown out of such beginnings. What constitutes an “individual” can be highly subjective: an individual to one researcher can, to another, be a complex distributed system comprised of finer-grained agents. Research in brain theory has dealt with different levels, from neurons to brain regions to humans whereas AI work in multi-agent systems has focused on coarse-grained levels of individuality and interaction, where the goal is to draw upon sociological, political, and economic insights. The article is designed to survey enough of this work on multi-agent systems to foster comparisons between the ANN, brain theory, and multi-agent approaches. A crucial notion is that agents either have or learn models of the agents with which they interact. These models allow agents to avoid dealing with malicious or broken agents. Agents may even build nested models of the other agents that include an agent’s models of other agents, and so on. By using their models of each other, the agents loosely organize themselves into self-reinforcing communities of trust, avoiding unproductive future interactions with other agents. In another branch of AI, work on expert systems—information systems that represent expert knowledge for a particular problem area as a set of rules, and that perform inferences when new data are entered—provided an important application success in which numerical confidence values played a role, but with the emphasis still on manipulation of hypotheses through the serial application of explicit rules. As shown in Decision Support Systems and Expert Systems, we now see many cases in which the application of separate rules is replaced by transformations effected in parallel by (trainable) neural networks. A decision system is either a decision support system or an expert system in the classic AI sense. The article reviews results on connectionist-based decision systems. In particular, trainable knowledge-based neural networks can be used to accumulate both knowledge (rules) and data, building adaptive decision systems with incremental, on-line learning.
As the general overview article Artificial Intelligence and Neural Networks makes clear, there are many problems for which the (not necessarily serial) manipulation of symbolic structures can still outperform connectionist approaches, at least with today’s software running on today’s hardware. Nonetheless, if we define AI by the range of problems it is to solve—or the “packets of intelligence” it is to implement—then it is no longer useful to define it in opposition to connectionism. In general, the technologist facing a specific problem should choose between, or should combine, connectionist and symbolic approaches on the basis of efficacy, not ideology. On occasion, for rhetorical purposes, authors will use the term AI for a serial symbolic methodology distinct from connectionism. However, we will generally use it in an extended sense of a technology that seeks to realize aspects of intelligence in machines by whatever methods work best. The term symbolic AI will then be used for the “classical” approach. The article examines the relative merits of symbolic AI systems and neural networks, and ways of attempting to bridge between the two. In brain theory, everything, whether symbolic or not, is, in the final analysis, implemented in a neural network. But even here, an analysis of the brain will often best be conducted in terms of interacting subsystems that are not all fully explicated in neural network terms. Schema Theory complements neuroscience’s well-established terminology for levels of structural analysis (brain region, neuron, synapse) with a framework for analysis of behavior with no necessary commitment to hypotheses on the localization of each schema (unit of functional analysis), but which can be linked to a structural analysis whenever appropriate. The article focuses on two issues: structuring perceptual and motor schemas to provide an action-oriented account of behavior and cognition (as relevant to the roboticist as the ethologist), and how schemas describing animal behavior may be mapped to interacting regions of the brain. Schema-based modeling becomes part of neuroscience when constrained by data provided by, e.g., human brain mapping, studies of the effects of brain lesions, or neurophysiology. The resulting model may constitute an adequate explanation in itself or may provide the framework for modeling at the level of neural networks or below. Such a neural schema theory provides a functional/structural decomposition, in strong contrast to models that employ learning rules to train a single, otherwise undifferentiated, neural network to respond as specified by some training set. Hybrid Connectionist/Symbolic Systems reviews work on hybrid systems that integrate neural (ANN) and symbolic processes. Cognitive processes are not homogeneous, and so some are best captured by symbolic models and others by connectionist models. Correspondingly, from a technological viewpoint, AI systems for practical applications can benefit greatly from a proper combination of different techniques combining, e.g., symbolic models (for capturing explicit knowledge) and connectionist models (for capturing implicit knowledge).
Use of the term systematicity in relation to connectionist networks originated with Fodor and Pylyshyn’s critique of connectionist architecture. They claimed that human cognitive abilities are systematically related in a way that does not hold in connectionist networks, unlike formal systems akin to propositional logic. Systematicity of Generalizations in Connectionist Networks starts by noting that this critique made no reference to learning-based generalization, and then proceeds to examine claims and counterclaims concerning the claim that learning in connectionist architectures can engender systematicity. Special attention is paid to studies based on simple recurrent networks (SRNs) and recursive auto-associative memory (RAAM). The article suggests that, for now, evidence for systematicity in such simple networks is rather limited. Perhaps this is not so surprising, given that there is little evidence of systematicity in most animals, and animal brains are vastly more complex than SRNs or RAAMs. Compare “Language Evolution: The Mirror System Hypothesis” for a discussion of how evolution may have shaped the human brain to extend capabilities shared with other species to yield novel human cognitive abilities.
The notion of representation plays a central role in AI. As discussed in Semantic Networks, one classic form of representation in AI is the semantic network, in which nodes represent concepts and links represent relations between them. Semantic networks were originally developed for couching “semantic” information, either in the psychologist’s sense of static information about concepts or in the semanticist’s sense of the meanings of natural language sentences. However, they are also used as a general knowledge representation tool. The more elaborate types of semantic networks are similar in their representational abilities to sophisticated forms of symbolic logic. The article discusses various ways of implementing or emulating semantic networks in neural networks, and of forming hybrid semantic network-neural network systems. Structured Connectionist Models emphasizes those neural networks in which the translation from symbolic to neural is fairly direct: nodes become “neurons,” but now processing is done by neural interactions rather than by an “inference engine” acting on a passive representation. At the other extreme, certain neural networks (connectionist, rather than biological) may transform input “questions” to output “answers” via the distributed activity of neurons whose firing conditions have no direct relationship to the concepts that might normally arise in a logical analysis of the problem (cf. “Past Tense Learning”). In the fully distributed version of the latter approach, each “item” (concept or mental object) is represented as a pattern of activity distributed over a common pool of nodes. However, if “John” and “Mary,” for example, are represented as patterns of activity over the entire network such that each node in the network has a specific value in the patterns for “John” and “Mary,” respectively, then how can the network represent “John” and “Mary” at the same time? To address such problems, the structured approach often employs small clusters of nodes that act as “focal” nodes for concepts and provide access to more elaborate structures that make up the detailed encoding of concepts (cf. “Localized Versus Distributed Representations”). The discussion of these varying styles of representation is continued in Connectionist and Symbolic Representations. In symbolic representations, the heart of mathematics and many models of cognition, symbols are meaningless entities to which arbitrary significance may be assigned. Composing ordered tuples from symbols and other tuples allows us to create an infinitude of complex structures from a finite set of tokens and combination rules. Inference in the symbolic framework is founded on structural comparison and rule-governed manipulation of these objects. However, AI makes extensive use of nondeductive reasoning methods. Symbolists have moved to more complex formalizations of cognitive processes, using heuristic and unsound inference rules. Connectionists explore a radical alternative: that cognitive processes are mere epiphenomena of a completely different type of underlying system, whose operations can never be adequately formalized in symbolic language. The article examines representation and processing issues in the connectionist move from classical discrete, set-theoretic semantics to a continuous, statistical, vector-based semantics.
In symbolic AI, two concepts can be linked by providing a pointer between them. In a neural net, the problem of “binding” the two patterns of activity that represent the concepts is a more subtle one, and several models address the use of rapidly changing synaptic strengths to provide temporary “assemblages” of currently related data. This theme is developed not only in Structured Connectionist Models, but also in the articles Compositionality in Neural Systems (how can inferences about a structure be based on the way it is composed of various elements?), and “Object Structure, Visual Processing” (combining visual elements of an object into a recognizable whole). Dynamic Link Architecture, the basic methodology, views the brain’s data structure as a graph composed of nodes connected by links. Both units and links bear activity variables changing on the rapid time scale of fractions of a second. The nodes play the role of symbolic elements. The intensity of activity measures the degree to which a node is active in a given time interval, signifying the degree to which the meaning of the node is alive in the mind of the animal, while correlations of activity between nodes quantify the degree to which the signal of one node is related to that of others. The strength of links can change on two time scales, represented by two variables called temporary weight and permanent weight. The permanent weight corresponds to the usual synaptic weight, can change on the slow time scale of learning, and represents permanent memory. The temporary weight can change on the same time scale as the node activity—it is what makes the link dynamic. On this view, dynamic links constitute the glue by which higher data structures are built up from more elementary ones.
Complementing the theme of representation in symbolic AI has been that of planning, going from (representations of) the current state and some desired state to a sequence of operations that will transform the former to the latter. Competitive Queuing for Planning and Serial Performance presents neural network studies based on two assumptions: that more than one plan representation can be simultaneously active in a planning layer, and that which plan to enact next is chosen as the most active plan representation by a competition in a second neural layer. Once a plan wins the competition and is used to initiate a response, its representation is deleted from the field of competitors in the planning layer, and the competition is re-run. This iteration allows the two-layer network to transform an initial activity distribution across plan representations into a serial performance. Such models provide a very different basis for control of serial behavior than that given by recurrent neural networks. The article suggests that such a system was probably an ancient invention in the evolution of animals yet may still serve as a viable core for the highest levels of planning and skilled sequencing exhibited by humans.
The final articles in this road map are not on neural nets per se, but instead provide related methods that add to the array of techniques extending AI beyond the serial, rule-based approach. Bayesian Networks provides an explicit method for following chains of probabilistic inference such as those appropriate to expert systems, extending Bayes’s rule for updating probabilities in the light of new evidence. The nodes in a Bayesian network represent propositional variables of interest and the links represent informational or causal dependencies among the variables. The dependencies are quantified by conditional probabilities for each node given its parents in the network. The network supports the computation of the probabilities of any subset of variables given evidence about any other subset, and the reasoning processes can operate on Bayesian networks by propagating information in any direction. Graphical Models: Probabilistic Inference introduces the graphical models framework, which has made it possible to understand the relationships among a wide variety of network-based approaches to computation, and in particular to understand many neural network algorithms and architectures as instances of a broader probabilistic methodology. Graphical models use graphs to represent and manipulate joint probability distributions. The graph underlying a graphical model may be directed, in which case the model is often referred to as a belief network or a Bayesian network, or the graph may be undirected, in which case the model is generally referred to as a Markov random field. The articles Graphical Models: Structure Learning and Graphical Models: Parameter Learning present learning algorithms that build on these inference algorithms and allow parameters and structures to be estimated from data. (A fuller précis of the three articles on graphical models can be found in the road map Learning in Artificial Networks.) Finally, Memory-Based Reasoning applies massively parallel computing to answer questions about a new situation by searching for data on the most similar stored instances. Memory-based reasoning (MBR) refers to a family of nearest-neighbor-like methods for making decisions or classifications. Where nearest-neighbor methods generally use a simple overlap distance metric, MBR uses variants of the value distance metric. MBR and neural nets form decision surfaces differently, and so will perform differently. MBR can become arbitrarily accurate if large numbers of cases are available, and if these cases are well behaved and properly categorized, whereas neural nets cannot respond well to isolated cases but tend to be good at smooth extrapolation. For each article reviewed in this paragraph, the reader may ponder whether these methods are alternatives to connectionist AI, or whether they can contribute to the emergence of a technologically efficacious hybrid. As stated before, where brain theory seeks to know “how the brain does it,” AI must weigh the value of ANNs as a powerful technology for parallel, adaptive computation against that of other technologies on the basis of efficacy in solving practical problems on available hardware.
Activity-Dependent Regulation of Neuronal Conductances
Axonal Modeling
Biophysical Mechanisms in Neuronal Modeling
Biophysical Mosaic of the Neuron
Dendritic Processing
Dendritic Spines
Diffusion Models of Neuron Activity
Ion Channels: Keys to Neuronal Specialization
Neocortex: Basic Neuron Types
Neocortex: Chemical and Electrical Synapses
Oscillatory and Bursting Properties of Neurons
Perspective on Neuron Model Complexity
Single-Cell Models
Synaptic Interactions
Synaptic Noise and Chaos in Vertebrate Neurons
Synaptic Transmission
Temporal Dynamics of Biological Synapses
Nearly all the articles in the road maps Psychology, Linguistics and Speech Processing, and Artificial Intelligence discuss networks made of very simple neurons describable by a single internal variable, either binary or real-valued (the “membrane potential”) and that communicate with other neurons by a simple (generally nonlinear) function of that variable, sometimes referred to as the firing rate. Incoming signals are usually summed linearly via “synaptic weights,” and these weights in turn may be adjusted by simple learning rules, such as the Hebbian rule, the perceptron rule, or a reinforcement learning rule. Such simplifications remain valuable both for technological application of ANNs and for approximate models of large biological networks. Nonetheless, biological neurons are vastly more complex than these single-compartment models suggest. An appreciation of this complexity is necessary for the computational neuroscientist wishing to address the increasingly detailed database of experimental neuroscience. It is also important for the technologist looking ahead to the incorporation of new capabilities into the next generation of ANNs.
The neocortex is functionally parcellated into vertical columns (∼0.5 mm in diameter) traversing all six layers. These columns have no obvious anatomical boundaries, and the topographic mapping of afferent and efferent pathways probably determines their locations and dimensions as well as their functions. Neocortex: Basic Neuron Types shows that these apparently stereotypical microcircuits are composed of a daunting variety of precisely and intricately interconnected neurons and argues that this neuronal diversification may provide a foundation for maximizing the computational abilities of the neocortex. All anatomical cell types can display multiple discharge patterns and molecular expression profiles. Different cell types are synaptically interconnected according to complex organizational principles to form intricate stereotypical microcircuits. The article challenges neural network modelers to incorporate and account for this cellular diversity and the role of different cells in the computational capability of cortical microcircuits. Neocortex: Chemical and Electrical Synapses summarizes the diverse functional properties of synapses in neocortex. These synapses tend to be small, but their structure and biochemistry are complex. Both chemical and electrical synapses exist in neocortex. Chemical synapses are the “usual synapses” of neural network models, and are far more abundant. They use a chemical neurotransmitter that is packaged presynaptically into vesicles, released in quantized (vesicle-multiple) amounts, and binds to postsynaptic receptors that either open an ion channel directly (voltage-dependent ion channels) or modulate the channel through an intracellular molecule that links the activated receptor to the opening or closing of the channel. The latter molecule is called a “second messenger,” to contrast it with the case in which the transmitter itself provides a “primary message” that acts directly on the channel, in this case called “ligand-gated.” Second-messenger-based synaptic interaction occurs on a slower time scale than ligand-gated interaction and is called neuromodulation, since it may modulate the behavior of the postsynaptic neuron over a time scale of seconds or minutes rather than milliseconds (cf. “Neuromodulation in Invertebrate Nervous Systems” and “Neuromodulation in Mammalian Nervous Systems”). The essential element of an electrical synapse is a protein called a connexin; 12 connexins form a single intercytoplasmic ion channel, and a cluster of such channels constitutes a gap junction. Electrical synapses provide a direct pathway that allows ionic current or small organic molecules to flow from the cytoplasm of one cell to that of another. Short-term dynamics allow synapses to serve as temporal filters of neural activity. Long-term synaptic plasticity provides specific, localized substrates for various forms of memory. Modulation of synaptic function by neurotransmitters (see “Neuromodulation in Mammalian Nervous Systems”) provides a mechanism for globally altering the properties of a neural circuit during changes of behavioral state. Each of these functions has diverse forms that vary between synapses, depending on their site within the cortical circuit (and elsewhere in the brain).
Perspective on Neuron Model Complexity discusses the wide range of model complexity, from very simple to rather complex neuron models. Which model to choose depends, in each case, on the context, such as how much information we already have about the neurons under consideration and what questions we wish to answer. The use of more realistic neuron models when seeking functional insights into biological nervous systems does not mean choosing the most complex model, at least in the sense of including all known anatomical and physiological details. Rather, the key is to preserve the most significant distinctions between regions (soma, proximal dendritic, distal dendritic, etc.), using “compartmental modeling,” whereby one compartment represents each functionally distinct region. Single-Cell Models starts by reviewing the “simple” models of Part I (the McCulloch-Pitts, perceptron, and Hopfield models) and the slightly more complex polynomial neuron. It then turns to more realistic biophysical models, most of which are explored in further detail in this road map. These include the Hodgkin-Huxley model of squid axon, integrate-and-fire models, modified single-point models, cable and compartmental models, and models of synaptic conductances.
Before turning to a detailed analysis of mechanisms of neuronal function, we first consider an article that offers a high-level view of the neuron, but this time a stochastic one. Most nerve cells encode their output as a series of action potentials, or spikes, that originate at or close to the cell body and propagate down the axon at constant velocity and amplitude. Diffusion Models of Neuron Activity studies the membrane potential of a single neuron as engaged in a stochastic process that will eventually bring it to the threshold for spike initiation. This leads to the first-passage-time problem, inferring the distribution of neuronal spiking based on the “first passage” of the membrane potential from its resting value to threshold. In addition to using stochastic differential equations, the article shows how the Wiener and Ornstein-Uhlenbeck neuronal models can be obtained as the limit of a Markov process with discrete state spaces. Besides these models, characterized by additive noise terms appearing in the corresponding stochastic differential equations, the article also reviews diffusion models with multiplicative noise, showing that these can be used not only for the description of steady-state firing under constant stimulation, but also for effects of periodic stimulation.
Now for the details of neuronal function. The ionic mechanisms underlying the initiation and propagation of action potentials were elucidated in the squid giant axon by a number of workers, most notably Hodgkin and Huxley. Variations on the Hodgkin-Huxley equation underlie the vast majority of contemporary biophysical models. Axonal Modeling describes this model and its assumptions, introduces the two classes of axons (myelinated and nonmyelinated) found in most animals, and concludes by briefly commenting on the possible functions of axonal branching in information processing. The Hodgkin-Huxley equation was brilliantly inferred from detailed experiments on conduction of nerve impulses. Much research since then has revealed that the basis for these equations is provided by “channels,” structures built from a few macromolecules and embedded in the neuron which, in a voltage-dependent way, can selectively allow different ions to pass through the cell membrane to change the neuron’s membrane potential. Similarly, channels (also known in this case as receptors) in the postsynaptic membrane can respond to neurotransmitters, chemicals released from the presynaptic membrane, to change the neuron’s local membrane potential in response to presynaptic input. These changes, local to the synapse, must propagate down the dendrites and across the cell body to help determine whether or not the axon will “pass threshold” and generate an action potential. Ion Channels: Keys to Neuronal Specialization notes that channels not only produce action potentials but can set a particular firing pattern, latency, rhythm, or oscillation for the firing of these spikes. Each neuronal class is endowed with a different set of channels, and the diversity of channels between different types of neurons explains the functional classes of neurons found in the brain. Some neurons fire spontaneously, some show adaptation, some fire in bursts, and so on. Therefore, a channel-based cellular physiology is relevant to questions about the role of different brain regions in overall function.
Biophysically detailed compartmental models of single neurons typically aim to quantitatively reproduce membrane voltages and currents in response to some sort of “synaptic” input. We may think of them as “Hodgkin-Huxley-Rall” models, based on the hypothesis of the neuron as a dynamical system of nonlinear membrane channels distributed over an electrotonic cable skeleton. Such models can incorporate as much biophysical detail as desired (or practical), but in general, all include some explicit assortment of voltage-dependent and transmitter-gated (synaptic) membrane channels. Biophysical Mechanisms in Neuronal Modeling first presents general issues regarding model formulations and data interpretation. It then describes the modeling of various features of Hodgkin-Huxley-Rall models, including Hodgkin-Huxley and Markov kinetic descriptions of voltage- and second-messenger-dependent ion channels as well as methods for describing intracellular calcium dynamics and the associated buffer systems and membrane pumps. The models for each of these mechanisms are at an intermediate level of biophysical detail, appropriate for describing macroscopic variables (e.g., membrane currents, ionic concentrations) on the scale of the entire cell or anatomical compartments thereof. Similar models of synaptic mechanisms are covered in Synaptic Interactions, which provides kinetic models of how synaptic currents arise from ion channels whose opening and closing are controlled (gated) directly or indirectly by the release of neurotransmitter. The article compares several models of synaptic interaction, focusing on simple models based on the kinetics of postsynaptic receptors, and shows how these models capture the time courses of postsynaptic currents of several types of synaptic responses, as well as synaptic summation, saturation, and desensitization.
The membrane potential of central neurons undergoes synaptic noise, fluctuations that depend on both the summed firing of action potentials by neurons presynaptic to the investigated cell and the spontaneous release of transmitter. Synaptic Noise and Chaos in Vertebrate Neurons argues that, despite its random appearance, synaptic noise may be a true signal associated with neural coding, possibly a chaotic one. In addition to reviewing tools for detecting chaotic behavior, the article pays special attention to Mauthner cells, a pair of identified neurons in the hindbrain of teleost fishes. When the fish is subjected to an unexpected stimulus, one of the cells triggers an escape reaction. Their excitability is controlled by powerful inhibitory presynaptic interneurons that continuously generate an intense synaptic noise. While it is still an open question whether this synaptic noise exhibits deterministic chaos or is truly random, it is worth stressing that the “noise” has adaptive value for the fish: the variability along output pathways introduces uncertainty in the expression of the reflex, and therefore enhances the fish’s success in evading predators.
Temporal Dynamics of Biological Synapses complements the many studies of synaptic plasticity in the Handbook that focus on long-term changes in synaptic strength by showing how synaptic function can be profoundly influenced by activity over time scales of milliseconds to seconds. Synapses that exhibit such short-term plasticity are powerful computational elements that can have profound impact on cortical circuits (cf. “Dynamic Link Architecture”). Short-term plasticity includes both synaptic depression and a number of components of short-term enhancement (facilitation, augmentation, and posttetanic potentiation) acting over increasingly longer periods of time. Synaptic facilitation appears to result from enhanced transmitter release due to elevated presynaptic calcium levels, while depression is believed to result, in part, from depletion of a readily releasable pool of vesicles. Depression appears to be a particularly prominent feature of transmission at excitatory synapses onto pyramidal cells. In addition to having complex short-term dynamics, synapses are stochastic, and it is argued that constructive roles for unreliable transmission become apparent when short-term plasticity is considered in connection with stochastic transmission, with synapses acting as stochastic temporal filters of their presynaptic spike trains. Indeed, Synaptic Transmission is concerned with the uncertainties introduced by noise and their relation to synaptic plasticity. The probability that a single activated synapse will release neurotransmitter has a broad distribution, well fitted by a gamma function, with a mean near 0.3. The dynamic regulation of synaptic strength depends on a complicated set of mechanisms that record the history of synaptic use over many time scales, and serve to filter the incoming spike train in a way that reflects the past use of the synapse. The article provides equations which describe how synaptic use determines the number of vesicles available for release, and for the release probability in turn.
Oscillatory and Bursting Properties of Neurons offers a dynamic systems analysis of the linkage between a fascinating variety of endogenous oscillations (neuronal rhythms) and appropriate sets of channels. However, membrane potential oscillations with apparently similar characteristics can be generated by different ionic mechanisms, and a given cell type may display several different firing patterns under different neuromodulatory conditions. Here, membrane dynamics are described by coupled differential equations, the behavior modes by attractors (cf. “Computing with Attractors”), and the transitions between modes by bifurcations. The rest state is represented by a time-independent steady state, and repetitive firing is represented by a limit cycle. (“Silicon Neurons” shows how such differential equations can be directly mapped into an electronic circuit built using analog VLSI, to allow real-time exploration of the behavior of quite realistic neural models.)
Roughly a dozen different types of ion channels contribute to the membrane conductance of a typical neuron. Activity-Dependent Regulation of Neuronal Conductances takes as its starting point the fact that the electrical characteristics of a neuron depend on the number of channels of each type active within the membrane and on how these channels are distributed over the surface of the cell. A complex array of biochemical processes controls the number and distribution of ion channels by constructing and transporting channels, modulating their properties, and inserting them into and removing them from the neuron’s membrane. The point to note here is that channels are small groupings of large molecules, and they are assembled on the basis of genetic instructions in the cell nucleus. Thus, changing which genes are active (i.e., regulating gene expression) can change the set of channels in a cell, and thus the characteristics of the cell. In fact, electrical activity in the cell can affect a range of processes, from activity-induced gene expression to activity-dependent modulation of assembled ion channels. Channel synthesis, insertion, and modulation are much slower than the usual voltage- and ligand-dependent processes that open and close channels. Thus, consideration of activity-dependent regulation of conductances introduces a dynamics acting on a new, slower time scale into neuronal modeling, a feedback mechanism linking a neuron’s electrical characteristics to its activity. A similar theme is developed in Biophysical Mosaic of the Neuron, which is structured around the metaphor of the mosaic neuron. A mosaic is a collection of discrete parts, each with unique properties, that are fitted together in such a way that an image emerges from the whole in a nonobvious way. Similarly, the neuronal membrane is packed with a diversity of receptors and ion channels and other proteins with a recognizable distribution. In addition, the cytoplasm is not just water with ions, but a mosaic of interacting molecular systems that can directly affect the functional properties of membrane proteins. The argument is that, just as a mosaic painting provokes perception of a complete image out of a maze of individually diversified tiles, so a given neuron performs a well-defined computational role that depends not only on the network of cells in which it is embedded, but also to a large extent on the dynamic distribution of macromolecules throughout the cell.
Dendritic Processing focuses on dendrites as electrical input-output devices that operate on a time scale range of several to a few hundred milliseconds. (See “Dendritic Learning” for modeling of the plasticity of dendritic function and the assertion that the concept of “overall connection strength between two neurons” is ill-defined, since it is the distribution of synapses in relation to dendritic geometry that proves crucial.) The input to a dendrite consists of temporal patterns of synaptic inputs spatially distributed over the dendritic surface, whereas the output is (except, for example, in the case of dendrodendritic interactions) an ionic current delivered to the soma for transformation there, via a threshold mechanism, to a train of action potentials at the axon. The article discusses how the morphology, electrical properties, and synaptic inputs of dendrites interact to perform their input-output operation. It uses cable theory and compartmental modeling to model the spread of electric current in dendritic trees. The variety of excitable (voltage-gated) channels that are found in many types of dendrites enrich the computational capabilities of neurons, with interaction proceeding in both directions, away from and toward the soma. Computer modeling methods for neurons offer numerical methods for solving the equations describing branched cables. Dendritic Spines are short appendages found on the dendrites of many different cell types. They are composed of a bulbous “head” connected to the dendrite by a thin “stem.” An excitatory synapse is usually found on the spine head, and some spines also have a second, usually inhibitory, synapse located on or near the spine stem. Models in which the spine is represented as a passive electrical circuit show that the large resistance of a thin spine stem can attenuate a synaptic input delivered to the spine head. Other models address calcium diffusion and plasticity in spines. Current research focuses on the hypothesis that the spine stem provides a diffusional resistance that allows calcium to become concentrated in the spine head and calcium-dependent reactions to be localized to the synapse. This could be very important for plasticity changes, such as those that occur with long-term potentiation.
Axonal Path Finding
Cerebellum and Conditioning
Cerebellum and Motor Control
Cerebellum: Neural Plasticity
Conditioning
Dendritic Learning
Development of Retinotectal Maps
Dynamic Link Architecture
Habituation
Hebbian Learning and Neuronal Regulation
Hebbian Synaptic Plasticity
Information Theory and Visual Plasticity
Invertebrate Models of Learning: Aplysia and Hermissenda
NMDA Receptors: Synaptic, Cellular, and Network Models
Ocular Dominance and Orientation Columns
Post-Hebbian Learning Algorithms
Short-Term Memory
Somatotopy: Plasticity of Sensory Maps
Temporal Dynamics of Biological Synapses
Most studies of learning in ANNs involve a variety of learning rules, inspired in great part by the psychological hypotheses of Hebb and Rosenblatt (cf. Section I.3) about ways in which synaptic connections may change their strength as a result of experience. In recent years, much progress has been made in tracing the processes that underlie the plasticity of synapses of biological neurons. The present road map samples this research together with related modeling. Although the emphasis will be on synaptic plasticity, several articles stress the role of axonal growth in forming new connections, and the road map closes with an article suggesting that changes in location of synapses may be just as important as changes in synaptic strength.
Hebb’s idea was that a synapse (what we would now call a Hebbian synapse) strengthens when the presynaptic and postsynaptic elements tend to be coactive. The plausibility of this hypothesis has been enhanced by the neurophysiological discovery of a synaptic phenomenon in the hippocampus known as long-term potentiation (LTP), which is induced by a Hebbian mechanism. Hebb’s postulate has received various modifications to address, e.g., the saturation problem.
Hebbian Synaptic Plasticity shows that a variety of experimental networks ranging from the abdominal ganglion in the invertebrate Aplysia to visual cortex and the CA1 region of hippocampus offer converging validation of Hebb’s postulate on strengthening synapses by (more or less) coincident presynaptic and postsynaptic activity. In these networks, similar algorithms of potentiation can be implemented using different cascades of second messengers triggered by activation of synaptic and/or voltage-dependent conductances. Most cellular data supporting Hebb’s predictions have been derived from electrophysiological measurements of composite postsynaptic potentials or synaptic currents, or of short-latency peaks in cross-correlograms, which cannot always be interpreted simply at the synaptic level. The basic conclusion of these experiments is that covariance between pre- and postsynaptic activity upregulates and downregulates the “effective” connectivity between pairs of functionally coupled cells. The article thus suggests that what changes according to a correlational rule is not so much the efficacy of transmission at a given synapse, but rather a more general coupling term mixing the influence of polysynaptic excitatory and inhibitory circuits linking the two cells, modulated by the diffuse network background activation. Replacing this composite interaction by a single coupling term defines an ideal Hebbian synapse.
The crucial role played in the CA1 form of LTP by channels called NMDA receptors in the synapses is further explained in NMDA Receptors: Synaptic, Cellular, and Network Models. NMDA receptors are subtypes of receptors for the excitatory neurotransmitter glutamate and are involved in diverse physiological as well as pathological processes. They mediate a relatively “slow” excitatory postsynaptic potential, and act as coincidence detectors of presynaptic and postsynaptic activity. The interactions between the slow NMDA-mediated and fast AMPA-mediated currents provide the basis for a range of dynamic properties that contribute to diverse neuronal processes. NMDA receptors have attracted much interest in neuroscience because of their role in learning and memory. Their ability to act as coincidence detectors make them an ideal molecular device for producing Hebbian synapses. The article reviews data related to the biological characteristics of NMDA receptors and models that have been used to describe their function in isolated membrane patches, in neurons, and in complex circuits.
A classic problem with Hebb’s original rule is that it only strengthens synapses. But this means that all synapses would eventually saturate, depriving the cell of its pattern separation ability. A number of biologically inspired responses to this problem are described in the next two articles. Hebbian Learning and Neuronal Regulation stresses that, for both computational and biological reasons, Hebbian plasticity will involve many synapses of the same neuron. Biologically, synaptic interactions are inevitable as synapses compete for the finite resources of a single neuron. Computationally, neuron-specific modifications of synaptic efficacies are required in order to obtain efficient learning, or to faithfully model biological systems. Hence neuronal regulation, a process modulating all synapses of a postsynaptic neuron, is a general phenomenon that complements Hebbian learning. The article shows that neuronal regulation may answer important questions, such as: What bounds the positive feedback loop of Hebbian learning and guarantees some normalization of the synaptic efficacies of a neuron? How can a neuron acquire specificity to particular inputs without being prewired? How can memories be maintained throughout life while synapses suffer degradation due to metabolic turnover? In unsupervised learning, neuronal regulation allows for competition between the various synapses on a neuron and leads to normalization of their synaptic efficacies. In supervised learning, neuronal regulation improves the capacity of associative memory models and can be used to guarantee the maintenance of biological memory systems. Our basic tour of Hebbian learning concludes with Post-Hebbian Learning Algorithms. This article starts by observing that Hebb’s original postulate was a verbally described phenomenological rule, without specification of detailed mechanisms. Subsequent work has shown the computational usefulness of many variations of the original learning rule. This article presents background material on conditioning, neural development, and physiologically realistic cellular-level learning phenomena as a prelude to a review of several families of rules providing computational implementations of Hebbian-inspired rules.
Cerebellum and Motor Control reviews a number of models for cerebellar mechanisms underlying the learning of motor skills. Cerebellum can be decomposed into cerebellar nuclei and a cerebellar cortex. The only output cells of the cerebellar cortex are the Purkinje cells, and their only effect is to provide varying levels of inhibition on the cerebellar nuclei. Each Purkinje cell receives two types of input: a single climbing fiber, and many tens of thousands of parallel fibers. The most influential model of cerebellar cortex has been the Marr-Albus model of the formation of associative memories between particular patterns on parallel fiber inputs and Purkinje cell outputs, with the climbing fiber acting as “training signal.” Later models place more emphasis on the relation between the cortex and nuclei, and on the way in which the subregions of this coupled cerebellar system can adapt and coordinate the activity of specific motor pattern generators. The plasticity of the cerebellum is approached from a different direction in Cerebellum and Conditioning. Many experiments indicate that the cerebellum is involved in learning and performance of classically conditioned reflexes; the present article reviews a number of models of the role of cerebellum in rabbit eyelid conditioning. (A more general perspective on conditioning is given in Conditioning and described more fully in the road map Psychology, which describes several formal theories and neural network models for classical and operant conditioning.) Inspired by the Marr-Albus hypothesis, neurophysiological research eventually showed that coincidence of climbing fiber and parallel fiber activity on a Purkinje cell led to long-term depression (LTD) of the synapse from parallel fiber to Purkinje cell. Cerebellum: Neural Plasticity offers readers an exhaustive overview of the data on the neurochemical mechanisms underlying this form of plasticity. The authors conclude that the timing conditions for LTD induction may account for the temporal specificity of cerebellar motor learning, and suggest that an important future development in the field will be to study developmental aspects of LTD in relation to acquisition of motor skills. However, the article cites only one model of LTD. It is clear that there are immense challenges to neural modelers in exploring the implications of the plethora of neurochemical interactions swirling about this single class of synaptic plasticity and, by implication, the variety of different mechanisms expressed elsewhere in the nervous system.
There is now strong evidence for a process of short-term memory (STM) involved in performing tasks requiring temporary storage and manipulation of information to guide appropriate actions. Short-Term Memory addresses three issues: What are the different types of STM traces? How do intrinsic and synaptic mechanisms contribute to the formation of STM traces? How do STM traces translate into long-term memory representation of temporal sequences? The stress is on the computational mechanisms underlying these processes, with the suggestion that these mechanisms may well underlie a wide variety of seemingly different biological processes. The article examines both the short-term preservation of patterns of neural firing in a circuit and ways in which short-term maintained activity may be transferred into long-term memory traces.
There is no hard and fast line between the cellular mechanisms underlying the development of the nervous system and those involved in learning. Nonetheless, the former emphasizes the questions of how one part of the brain comes to be connected to another and how overall patterns of connectivity are formed, while the latter tends to regard the connections as in place, and asks how their strengths can be modified to improve the network’s performance. Studies of regeneration—the reforming of connections after damage to neurons or cell tracts—are thus associated more with developmental mechanisms than with learning per se. Another significant area of research that complements development is that of aging, but there is still too little work relating aging to neural modeling.
Study of the regeneration of retinotopic eye-brain maps in frogs (i.e., neighboring points in the frog retina map, in a one-to-many fashion, to neighboring points in the optic tectum) has been one of the most fruitful areas for theory-experiment interaction in neuroscience. Following optic nerve section, optic nerve fibers tended to regenerate connections with those target neurons to which they were connected before surgery, even after eye rotation. This suggests that each cell in both retina and tectum has a unique chemical marker signaling 2D location, and that retinal axons seek out tectal cells with the same positional information. However, in experiments in which lesions were made in goldfish retina or tectum, it was found that topographic maps regenerated in conformance with whatever new boundary conditions were created by the lesions; e.g., the remaining half of a retina would eventually connect in a retinotopic way to the whole of the tectum, rather than just to the half to which it was originally connected. Although there is wide variation between species in the degree of order existing in the optic nerve, it is almost always the case that the final map in the tectum is ordered to a greater extent than is the optic nerve. Theory and experiment paint a subtle view in which genetics sets a framework for development, but the final pattern of connections depends both on boundary conditions and on patterns of cellular activity. This view is now paradigmatic for our understanding of how patterns of neural connectivity are determined. The development of such maps appears to proceed in two stages: the first involves axon guidance independent of neural activity; the second involves the refinement of initially crude patterns of connections by processes dependent on neural activity. Axonal Path Finding focuses on the former events, while Development of Retinotectal Maps discusses the latter. Understanding the molecular basis of retinotectal map formation has been transformed since the appearance of the first edition of the Handbook by discoveries centering on ephrins and the corresponding Eph receptors. The Eph/ephrins come in two families, A and B, with the A family important for mapping along the rostral-caudal axis of the tectum, while the B family may be important for mapping along the dorsal-ventral axis. Most models of development of retinotectal maps take synaptic strengths as their primary variable between arrays of retinal and tectal locations, with initial synaptic strengths then updated according to rules that depend in various ways on correlated activity, competition for tectal space, molecular gradients, and fiber-fiber interactions. However, actual movement or branching of axons to find their correct targets is rarely considered. Thus, future computational models of retinotectal map formation should take into account data on Eph receptors and ephrin ligands, data on the guidance of retinal axons that enter the tectum by ectopic routes, and the results of retinal and tectal ablation and transplantation experiments. Up to now, the great majority of theoretical work in the neural network tradition has focused on changes in synaptic strengths within a fixed connectional architecture, but how axons chart their initial path toward the correct target structure has generally not been addressed. Axonal Path Finding reviews recent experimental work addressing how retinal ganglion cell axons find the optic disk, how they then exit the retina, why they grow toward the optic chiasm, why some then cross at the midline while others do not, and so on—a body of knowledge that now has the potential to be framed and interpreted in terms of theoretical models. Whereas work in neural networks has usually focused on processes such as synaptic plasticity that are dependent on neural activity, models for axon guidance must generally be phrased in terms of activity-independent mechanisms, particularly guidance by molecular gradients. Many fundamental questions remain unresolved, for which theoretical models have the potential to make an important contribution. What is the minimum gradient steepness detectable by a growth cone, and how does this vary with the properties of the receptor-ligand interaction and the internal state of the growth cone? How is a graded difference in receptor binding internally converted into a signal for directed movement? And, how do axons integrate multiple cues?
Ocular Dominance and Orientation Columns studies two issues that go beyond basic map formation to provide further insight into activity-dependent development. When cells in layer IVc of visual cortex are tested to see which eye drives them more strongly, it is found that ocular dominance takes the form of a zebra-stripe-like pattern of alternating dominance. Model and experiment support the view that the stripes are not genetically specified but instead form through network self-organization. Another classic example is the formation of orientation specificity. A number of models are reviewed in light of current data, both theoretical analysis based on the idea that leading eigenvectors dominate (cf. “Pattern Formation, Biological” and “Pattern Formation, Neural”) and computer simulations.
Temporal Dynamics of Biological Synapses complements the many studies of synaptic plasticity in the Handbook that focus on long-term changes in synaptic strength by showing the importance of fast synaptic changes over time scales of milliseconds to seconds. Short-term plasticity includes both synaptic depression and a number of components of short-term enhancement (facilitation, augmentation, and posttetanic potentiation) acting over increasingly longer periods of time. In addition to having complex short-term dynamics, synapses are stochastic (see “Synaptic Transmission”), and it is argued that constructive roles for unreliable transmission become apparent when short-term plasticity is considered in connection with stochastic transmission, with synapses acting as stochastic temporal filters of their presynaptic spike trains. Dynamic Link Architecture develops the theme of fast synaptic changes at the level of network function, viewing the brain’s data structure as a graph composed of nodes connected by links whose strength can change on two time scales, represented by two variables called temporary weight and permanent weight. The permanent weight corresponds to the usual synaptic weight, can change on the slow time scale of learning, and represents permanent memory. The temporary weight can change on the same time scale as the node activity, providing the dynamic links that, according to this model, constitute the glue by which higher data structures are built up from more elementary ones.
Information Theory and Visual Plasticity demonstrates some features of information theory that are relevant to the relaying of information in cortex and presents cases in which information theory led people to seek methods for Gaussianizing the input distribution and, in other cases, to seek learning goals for non-Gaussian distributions. The MDL principle (see “Minimum Description Length Analysis”) was presented as a learning goal which takes into account the complexity of the decoding network. In particular, the article connects entropy-based methods, projection pursuit, and extraction of simple cells in visual cortex.
As can be seen from the above, neural network models of development and regeneration have been dominated by studies of the visual system. The next article, however, takes us to the somatosensory system. Research in the past decade has demonstrated plastic changes at all levels of the adult somatosensory system in a wide range of mammalian species. Changes in the relative levels of sensory stimulation as a result of experience or injury produce modifications in sensory maps. Somatotopy: Plasticity of Sensory Maps discusses which features of somatotopic maps change and under what conditions, the mechanisms that may account for these changes, and the functional consequences of sensory map changes.
Just as the giant squid axon provided invaluable insights into the active properties of neural membrane summarized in the Hodgkin-Huxley equation, so have invertebrates provided many insights into other basic mechanisms (see “Neuromodulation in Invertebrate Nervous Systems” and “Crustacean Stomatogastric System” for two examples). Invertebrate Models of Learning: Aplysia and Hermissenda does the same for basic learning mechanisms. A ganglion (localized neural network) of these invertebrates can control a variety of different behaviors, yet a given behavior such as a withdrawal response may be mediated by 100 neurons or less. Moreover, many neurons are relatively large and can be uniquely identified, functional properties of an individual cell can be related to a specific behavior, and changes in cellular properties during learning can be related to specific changes in behavior. Biophysical and molecular events underlying the changes in cellular properties can then be determined and mathematically modeled. The present article illustrates this with studies of two gastropod mollusks: associative and nonassociative modifications of defensive siphon and tail withdrawal reflexes in Aplysia and associative learning in Hermissenda.
Habituation describes one of the simplest forms of learning, the progressive decrement in a behavioral response with repeated presentations of the eliciting stimulus, and reveals the complexity in this apparent simplicity. This article reviews the fundamental characteristics of habituation and describes experimental preparations in which the neural basis of habituation has been examined as well as attempts to model habituation. Experimental studies have identified at least two important neural mechanisms of habituation, homosynaptic depression within the reflex circuit and extrinsic descending modulatory input. A number of systems are put forward as good candidates for future modeling. Habituation of defensive reflexes was among the first types of learning explained successfully at the cellular level. Habituation in the crayfish tail-flip reflex, due to both afferent depression as well as descending inhibition, offers the opportunity to analyze the interaction and cooperativity of mechanisms intrinsic and extrinsic to the reflex circuit. The nematode C. elegans offers the possibility of a genetic analysis of habituation.
As shown in “Dendritic Processing,” dendrites are highly complex structures, both anatomically and physiologically, and are the principal substrates for information processing within the neuron. Dendritic Learning assesses the consequences of axodendritic structural plasticity for learning and memory, countering the view that neural plasticity is limited to the strengthening and weakening of existing synaptic connections. In particular, the article supports the view that long-term storage may involve the correlation-based sorting of synaptic contacts onto the many separate dendrites of a target neuron. In the models offered in this article, the output of the cell represents the sum of a moderately large set of separately thresholded dendritic subunits, so that a single neuron as modeled here is equivalent to a conventional ANN built from two layers of point neurons. As a result, the concept of “overall connection strength between two neurons” is no longer well defined, for it is the distribution of synapses in relation to dendritic geometry that proves crucial.
Adaptive Spike Coding
Integrate-and-Fire Neurons and Networks
Localized Versus Distributed Representations
Motor Cortex: Coding and Decoding of Directional Operations
Optimal Sensory Encoding
Population Codes
Rate Coding and Signal Processing
Sensory Coding and Information Transmission
Sparse Coding in the Primate Cortex
Synchronization, Binding and Expectancy
Synfire Chains
In the McCulloch-Pitts neuron, the output is binary, generated on a discrete-time scale; at the other extreme, the Hodgkin-Huxley equations can create a dazzling array of patterns of axonal activity in which the shape as well as the timing of each spike is continuously variable. In between, we have models such as the leaky integrator model, in which only the rate of firing of a cell is significant, while in the spiking neuron model the timing but not the shape of spikes is continuously variable. This raises the question of how sensory inputs and motor outputs, let alone “thoughts” and other less mental intervening variables, are coded in neural activity. In answering this question, we must not only seek to understand the significance of the firing pattern of an individual neuron but also probe how variables may be encoded in patterns of firing distributed across a whole population of neurons.
Retinotopic feature maps are the norm near the visual periphery and up into the early stages of the visual cortex. Here, the firing of a cell peaks for stimuli that fall on a specific patch of the retina and also for a specific feature. Perhaps the most famous example of this is provided by the simple cells discovered in visual cortex by Hubel and Wiesel, which are edge-sensitive cells tuned both for the retinal position and orientation of the edge. In such studies, the cell is characterized by its firing rate during presentation of the stimulus. Similar results are seen for other feature types (see “Feature Analysis”) and other sensory systems. The issue of how other information may be coded by activity in the nervous systems of animals is addressed in a number of articles. Localized Versus Distributed Representations asks whether the final neural encoding of visual recognition of one’s grandmother, say, involves neurons that respond selectively to “grandmother”—so-called “grandmother cells”—or whether the sight of grandmother is never made explicit at the single neuron level, with the representation instead distributed across a large number of cells, none of which responds selectively to “grandmother” alone. Few neuroscientists argue that individual neurons might explicitly represent particular objects, but many connectionists have used localist representations to model phenomena that include word and letter perception, although they generally insist that the units in their models are not real neurons. The article examines neurophysiological evidence that both distributed and local coding are used in high-order visual areas and then goes “against the stream” by forwarding computational reasons for preferring representations that are more localist in some parts of the brain, before examining how work on temporal coding schemes has changed the nature of the local versus distributed debate. Sparse Coding in the Primate Cortex marshals theoretical reasons and experimental evidence suggesting that the brain adopts a compromise between distributed and local representations that is often referred to as sparse coding. This thesis is illustrated with data on object recognition and face recognition in inferotemporal cortex (the “what” pathway) in monkey.
Perhaps the best-known example of motor coding is that described in Motor Cortex: Coding and Decoding of Directional Operations for the relation between the direction of reaching and changes in neuronal activity that have been established for several brain areas, including the motor cortex. The cells involved each have a broad tuning function the peak of which is considered to be the “preferred” direction of the cell. A movement in a particular direction will engage a whole population of cells. It is found that, during discrete movements in 2D and 3D space, the weighted vector sum of these neuronal preferences is a “population vector” which points in (close to) the direction of the movement. Such examples underlie the more general analysis given in Population Codes. Population codes are computationally appealing both because the overlap among the neurons’ tuning curves allows precise encoding of values that fall between the peaks of two adjacent tuning curves and because many cortical functions, such as sensorimotor transformations, can be easily modeled with population codes. The article focuses on decoding, or reading out, population codes. Neuronal responses are noisy, leading to the need for good estimators for the encoded variables. The article reviews the various estimators that have been proposed, and considers their neuronal implementations. Moreover, there are cases where it is reasonable to assume that population activity codes for more than just a single value, and could even code for a whole probability distribution. The goal of decoding is then to recover an estimate of this probability distribution.
Integrate-and-Fire Neurons and Networks shows how these models offer potential principles of coding and dynamics. At the single neuron level, it is shown that coherent input is more efficient than incoherent spikes in driving a postsynaptic neuron. Questions discussed for homogeneous populations include conditions under which it is possible, in the absence of an external stimulus, to stabilize a population of spiking neurons at a reasonable level of spontaneous activity, and the relation of frequency of collective oscillations to neuronal parameters, and how rapidly population activity responds to changes in the input. An extension to mixed excitatory/inhibitory populations as found in the cortex is also discussed. Synchronization, Binding and Expectancy argues that the “binding” of cells that correspond to features of a given visual object may exploit another dimension of cellular firing, namely, the phase at which a cell fires within some overall rhythm of firing. The article presents data consistent with the proposal that the synchronization of responses on a time scale of milliseconds provides an efficient mechanism for response selection and binding of population responses. Synchronization also increases the saliency of responses because it allows for effective spatial summation in the population of neurons receiving convergent input from synchronized input cells. Synfire Chains were introduced to account for the appearance of precise firing sequences with long interspike delays, dealing with the ways in which such chains might be generated, activity propagation along the chain, how synfire chains can be used to compute, and how they might be detected in electrophysiological recordings. A synfire chain is composed of many pools (or layers) of neurons connected in a feedforward fashion. In a random network with moderate connectivity, many synfire chains can be found by chance, but such random synfire chains may not function reproducibly unless the synaptic connections are strengthened by some appropriate learning rule. A given neuron can participate in more than one synfire chain. The extent to which such repeated membership can take place without compromising reproducibility is known as the memory capacity of synfire chains. Synfire chains may be considered a special case of the “cell assembly” suggested by Hebb. However, in Hebb’s concepts the cell assembly was a network with multiple feedback connections, whereas the synfire chain is a feedforward net. This allows for much faster computations by synfire chains. While noting that there have also been criticisms of the theory, the article argues that classical anatomy and physiology of the cortex sustain the idea that activity may be organized in synfire chains and that one can create compositional systems from synfire chains.
Rate Coding and Signal Processing investigates ways in which the sequence of spike occurrence times may encode the information that a neuron communicates to its targets. Spike trains are often quite variable under seemingly identical stimulation conditions. Does this variability carry information about the stimulus? The term rate coding is applied in situations where the precise timing of spikes is thought not to play a significant role in carrying sensory information. The article analyzes the sensory information conveyed by two types of rate codes, mean firing rate codes and instantaneous firing rate codes, by adapting classical methods of statistical signal processing to the analysis of neuronal spike trains. While focusing on various examples of rate coding, such as that of neurons of weakly electric fish sensitive to electrical field amplitude, the article also notes cases in which spike timing plays a crucial role.
Recent years have seen an increasing number of quantitative studies of neuronal coding based on Shannon’s information theory, in which the “information” or “entropy” of a message is a purely statistical measure based on the probability of the message within an ensemble: the less likely the message is to occur, the greater its information content. Sensory Coding and Information Transmission reviews two recent approaches to measuring transmitted information. The first is based on direct estimation of the spike train entropies in terms of which transmitted information is defined; the second is based on an expansion to second order in the length of the spike trains. The meaning of any signal that we receive from our environment is modulated by the context within which it appears. Adaptive Spike Coding explores the analysis of “context” as the statistical ensemble in which the signal is embedded. Interpreting a message requires both registering the signal itself and knowing something about this statistical ensemble. The relevant temporal or spatial ensemble depends on the task. Information theoretically, representations that appropriately take into account the statistical properties of the incoming signal are more efficient (seeOptimal Sensory Encoding and “Information Theory and Visual Plasticity”). The article focuses on neural adaptation, reversible change in the response properties of neurons on short time scales. Since the first observations of adaptation in spiking neurons, it had been suggested that adaptation serves a useful function for information processing, preventing a neuron from continuing to transmit redundant information, viewing both the filtering and the threshold function of a neuron as adaptive functions of the input that may implement the goal of increasing information transmission. Issues include adaptation to the stimulus distribution, with the information about the ensemble read off from the statistics of spike time differences; the separation of different time scales in adaptation; and adaptation of receptive fields. The article also explores the role of calcium and of channel dynamics in providing adaptation mechanisms.
Optimal Sensory Encoding focuses on the visual system, seeking to understand what type of data encoding for signals passing from retina to cerebral cortex could reduce the data rate without significant information loss, exploiting the fact that nearby image pixels tend to convey similar signals and thus carry redundant information. One strategy is to transform the original redundant signal (e.g., in photoreceptors) to nonredundant signals in the retinal ganglion cells or cortical neurons, as in the Infomax proposal. The article presents different coding schemes with different advantages. The retinal code has the advantage of small and identical receptive field (RF) shapes, involving shorter neural wiring and easier specifications. The cortical multiscale code is preferred when invariance is needed for objects moving in depth. Again, whereas the Infomax principle applies well to explain the RFs of the more numerous class of retinal ganglion cells, the P cells in monkeys or X cells in cats, another class of ganglion cells, M cells in monkeys or Y cells in cats, have RFs that are relatively larger, color unselective, and tuned to higher temporal frequencies. These M cells do not extract the maximum information possible (Infomax) about the input but can serve to extract the information as quickly as possible. It is argued that information theory is more likely to find its application in the early stages of the sensory processing, before information is selected or discriminated for any specific cognitive task, and that optimal sensory coding in later stages of sensory pathways will depend on cognitive tasks that require applications of alternative theories.
Cortical Hebbian Modules
Cortical Population Dynamics and Psychophysics
Dopamine, Roles of
Hippocampal Rhythm Generation
Integrate-and-Fire Neurons and Networks
Layered Computation in Neural Networks
Neuromodulation in Invertebrate Nervous Systems
Neuromodulation in Mammalian Nervous Systems
Recurrent Networks: Neurophysiological Modeling
Sleep Oscillations
Temporal Integration in Recurrent Microcircuits
We turn now to studies of biological neural networks, a study complemented by articles in the road map Mammalian Brain Regions and in other road maps on sensory systems, memory, and motor control.
Cortical Hebbian Modules models the activity seen in cortical networks during the delay period following the presentation of the stimulus in a delay match-to-sample or delay eye-movement task. The rates observed are in the range of about 10–20 spikes/s, with the subset of neurons that sustain elevated rates being selective of the sample stimulus and concentrated in localized columns in associative cortex. The article shows how to model these selective activity distributions through the autonomous local dynamics in the column. The model presents neural elements and synaptic structures that can reproduce the observed neuronal spike dynamics; showing how Hebbian synaptic dynamics can give rise, in a process of training, to a synaptic structure in the local module capable of sustaining selective activity during the delay period. The mathematical framework for the analysis is provided by the mean field theory of statistical mechanics.
Layered Computation in Neural Networks abstracts from the biology to present a general framework for modeling computations performed in layered structures (which occur in many parts of the vertebrate and invertebrate brain, including the optic tectum, the avian visual wulst, and the cephalopod optic lobe, as well as the mammalian cerebral cortex). A general formalism is presented for the connectivity between layers and the dynamics of typical units of each layer. Information processing capabilities of neural layers include filter operations; lateral cooperativity and competition that can be used in, e.g., stereo vision and winner-take-all; topographic mapping that underlies the allocation of cortical neurons to different parts of the visual field (fovea/periphery), or the processing of optic flow patterns; and feature maps and population coding, which may be applied both to sensory systems and to “motor fields” of neurons so that the flow of activity in motor areas can predict initiated movements. In a related vein, Cortical Population Dynamics and Psychophysics describes cortical population dynamics in the form of structurally simple differential equations for the neurons’ firing activities, using a model class introduced by Wilson and Cowan. The Wilson-Cowan model is powerful enough to reproduce a variety of cortical phenomena and captures the dynamics of neuronal populations seen in a variety of experiments, yet simple enough to allow for analytical treatment that yields an understanding of the mechanisms leading to the observed behavior. The model is applied here to explain dynamical properties of the primate visual system on different levels, reaching from single neuron properties like selectivity for the orientation of a stimulus up to higher cognitive functions related to the binding and processing of stimulus features in psychophysical discrimination experiments.
Hippocampal Rhythm Generation notes that global brain states in both normal and pathological situations may be associated with spontaneous rhythmic activities of large populations of neurons. This article presents data and models on the main such states associated with the hippocampus: the two main normally occurring states—the theta rhythm with the associated gamma oscillation, and the irregular sharp waves (SPW) with the associated high-frequency (ripple) oscillation—and a pathological brain state associated with epileptic seizures. Several different modeling strategies are compared in studying rhythmicity in the hippocampal CA3 region.
Sleep Oscillations analyzes cortical and thalamic networks at multiple levels, from molecules to single neurons to large neuronal assemblies, with techniques ranging from intracellular recordings to computer simulations, to illuminate the generation, modulation, and function of brain oscillations. Sleep is characterized by synchronized events in billions of synaptically coupled neurons in thalamocortical systems. The early stage of quiescent sleep is associated with EEG spindle waves, which occur at a frequency of 7 to 14 Hz; as sleep deepens, waves with slower frequencies appear on the EEG. The other sleep state, associated with rapid eye movements (REM sleep) and dreaming, is characterized by abolition of low-frequency oscillations and an increase in cellular excitability, very much like wakefulness, although motor output is markedly inhibited. Activation of a series of neuromodulatory transmitter systems during arousal blocks low-frequency oscillations, induces fast rhythms, and allows the brain to recover full responsiveness.
It is a truism that similarity of input-output behavior is no guarantee of similarity of internal function in two neural networks. In particular, a recurrent neural network trained by backpropagation to mimic some biological function may have little internal resemblance to the neural networks responsible for that function in the living brain. Nonetheless, Recurrent Networks: Neurophysiological Modeling demonstrates that dynamic recurrent network models (see “Recurrent Networks: Learning Algorithms” for the formal background) can provide useful tools to help systems neurophysiologists understand the neural mechanisms mediating behavior. Biological experiments typically involve bits of the system; neural network models provide a method of generating working models of the complete system. Confidence in such models is increased if they not only simulate dynamic sensorimotor behavior but also incorporate anatomically appropriate connectivity. The utility of such models is illustrated in the analysis of four types of biological function: oscillating networks, primate target tracking, short-term memory tasks, and the construction of neural integrators.
As is evident in the road map Biological Neurons and Synapses, not all neurons are alike: they show a rich variety of conductances that endow them with different functional properties. These properties and hence the collective activity of interacting groups of neurons are not fixed, but are instead subject to modulation. The term neuromodulation usually refers to the effect of neurochemicals such as acetylcholine, dopamine, norepinephrine, and serotonin, and other substances, including neuropeptides. By contrast with the rapid transmission of information through the nervous system by excitatory and inhibitory synaptic potentials, neuromodulators primarily activate receptor proteins, which do not contain an ion channel (metabotropic receptors). These receptors in turn activate enzymes, which change the internal concentration of substances called second messengers. Second messengers cause slower and longer-lasting changes in the physiological properties of neurons, resulting in changes in the processing characteristics of the neural circuit. Neuromodulation in Invertebrate Nervous Systems stresses that the sensory information an animal needs depends on a number of factors, including its activity patterns and motivational state. The modulation of the sensitivities of many sensory receptors is shown for a stretch receptor in crustaceans. Modulators can activate, terminate, or modify rhythmic pattern-generating networks. One example of such “polymorphism” is that neuromodulation can reconfigure the same network to produce either escape swimming or reflexive withdrawal in the nudibranch mollusk Tritonia. Mechanisms and sites of neuromodulation include alteration of intrinsic properties of neurons, alteration of synaptic efficacy by neuromodulators, and modulation of neuromuscular junctions and muscles. All this makes clear the subtlety of neuronal function that must be addressed by computational neuroscience and that may inspire the design of a new generation of artificial neurons. Turning to the mammalian brain, we find that the anatomical distribution of fibers releasing neuromodulatory substances in the brain is usually very diffuse, with the activity of a small number of neuromodulatory neurons influencing the functional properties of broad regions of the brain. Neuromodulation in Mammalian Nervous Systems starts by summarizing physiological effects of neuromodulation, including effects on resting membrane potential of pyramidal cells and interneurons, spike frequency adaptation, synaptic transmission, and long-term potentiation. It is stressed that the effect of a neurochemical is receptor dependent: a single neuromodulator such as serotonin can have dramatically different effects on different neurons, depending on the type of receptor it activates. Indeed, a chemical may function as a neurotransmitter for one receptor and as a neuromodulator for another. The second half of the article reviews neural network models that help us understand how neuromodulatory effects that appear small at the single neuron level may have a significant effect on dynamical properties when distributed throughout a network. The article reviews several different models of the function of modulatory influences in neural circuits, including noradrenergic modulation of attentional processes (strangely, noradrenergic neurons are those sensitive to norepinephrine), dopaminergic modulation (by dopamine) of working memory, cholinergic modulation (by acetylcholine) of input versus internal processing, and modulation of oscillatory dynamics in cortex and thalamus. Dopamine, Roles of then focuses specifically on roles of dopamine in both neuromodulation and in synaptic plasticity. Dopamine is a neuromodulator that originates from small groups of neurons in the ventral tegmental area, the substantia nigra, and in the diencephalon. Dopaminergic projections are in general very diffuse and reach large portions of the brain. The time scales of dopamine actions are diverse, from a few hundred milliseconds to several hours. The article focuses on the mesencephalic dopamine centers because they are the most studied, and because they are thought to be involved in diseases such as Tourette’s syndrome, schizophrenia, Parkinson’s disease, Huntington’s disease, drug addiction, and depression. These centers are also involved in such normal brain functions as working memory, reinforcement learning, and attention. The article discusses the biophysical effects of dopamine, how dopamine levels influence working memory, the ways in which dopamine responses resemble the reward prediction signal of the temporal difference model of reinforcement learning, and the role of dopamine in allocation of attention.
Integrate-and-Fire Neurons and Networks presents relatively simple models that take account of the fact that most biological neurons communicate by action potentials, or spikes (see also “Spiking Neurons, Computation with”). In contrast to the standard neuron model used in ANNs, integrate-and-fire neurons do not rely on a temporal average over the pulses. Instead, the pulsed nature of the neuronal signal is taken into account and considered as potentially relevant for coding and information processing. However, integrate-and-fire models do not explicitly describe the form of an action potential. Integrate-and-fire and similar spiking neuron models are phenomenological descriptions on an intermediate level of detail. Compared to other single-cell models, they allow coding principles to be discussed in a transparent manner. Moreover, the dynamics in networks of integrate-and-fire neurons can be analyzed mathematically, and large systems with thousands of neurons can be simulated rather efficiently.
Temporal Integration in Recurrent Microcircuits hypothesizes that the ability of neural computation in behaving organisms to produce a response at any time that depends appropriately on earlier sensory inputs and internal states rests on a common principle by which neural microcircuits operate in different cortical areas and species. The article argues that, while tapped delay lines, finite state machines, and attractor neural networks are suitable for modeling specific tasks, they appear to be incompatible with results from neuroanatomy (highly recurrent diverse circuitry) and neurophysiology (fast transient dynamics of firing activity with few attractor states). The authors thus view the transient dynamics of neural microcircuits as the main carrier of information about past inputs, from which specific information needed for a variety of different tasks can be read out in parallel and at any time by different readout neurons. This approach leads to computer models of generic recurrent circuits of integrate-and-fire neurons for tasks that require temporal integration of inputs and, it is argued, provides a new conceptual framework for the experimental investigation of neural microcircuits and larger neural systems.
Amplification, Attenuation, and Integration
Canonical Neural Models
Chains of Oscillators in Motor and Sensory Systems
Chaos in Biological Systems
Chaos in Neural Systems
Collective Behavior of Coupled Phase Oscillators
Computing with Attractors
Cooperative Phenomena
Dynamics and Bifurcation in Neural Nets
Dynamics of Association and Recall
Energy Functionals for Neural Networks
Optimization, Neural
Pattern Formation, Biological
Pattern Formation, Neural
Phase-Plane Analysis of Neural Nets
Self-Organization and the Brain
Short-Term Memory
Statistical Mechanics of Neural Networks
Stochastic Resonance
Winner-Take-All Networks
Much interest in ANNs has been based on the use of trainable feedforward networks as universal approximators for functions f: X → Y from the input space X to the output space Y. However, their provenance was more general. The founding paper of Pitts and McCulloch established the result that, by the mid-1950s, could be rephrased as saying that any finite automaton could be simulated by a network of McCulloch-Pitts neurons. A finite automaton is a discrete-time dynamic system; that is, on some suitable time scale, it specifies the next state q(t + 1) as a function δ(q(t), x(t)) of the current state and input (for articles related to automata and theory of computation, see the road map Computability and Complexity). But a neuron can be modeled as a continuous-time system (as in a leaky integrator neuron with the membrane potential as the state variable). A network of continuous-time neurons can then be considered as a continuous-time system with the rate of change of the state (which could, for example, be a vector whose elements are the membrane potentials of the individual neurons) defined as a function q˙(t) = f(q(t), x(t)) of the current state and input. When the input is held constant, the network (whether discrete- or continuous-time) may be analyzed by dynamical systems theory. Computing with Attractors shows some of the benefits of such an approach. In particular, a net with internal loops may go to equilibrium (providing a state from which the answer to some problem may be read out), enter a limit cycle (undergoing repetitive oscillations which are useful in control of movement, and in other situations in which a “clock cycle” is of value), or exhibit chaotic behavior (acting in an apparently random way, even though it is deterministic). In particular, the article builds on the notion of a Hopfield network. Hopfield contributed much to the resurgence of interest in neural networks in the 1980s by associating an “energy function” with a network, showing that if only one neuron changed state at a time, a symmetrically connected net would settle to a local minimum of the energy, and that many optimization problems could be mapped to energy functions for symmetric neural nets. Energy Functionals for Neural Networks uses the notion of Lyapunov function from the dynamical study of ordinary differential equations to show how the definition of energy function and the conditions for convergence to a local minimum can be broadened considerably. (Of course, a network undergoing limit cycles or chaos will not have an energy function that is minimized in this sense.) Optimization, Neural shows that this property can be exploited to solve combinatorial optimization problems that require a more or less exhaustive search to achieve exact solutions, with a computational effort growing exponentially or worse with system size. The article shows that ANN methods can provide heuristic methods that yield reasonably good approximate solutions. Recurrent network methods based on deterministic annealing use an interpolating continuous (analog) space, allowing for shortcuts to good solutions (compare “Simulated Annealing and Boltzmann Machines”). The key to the approach offered here is the technique of mean-field approximation from statistical mechanics. While early neural optimizations were confined to problems encodable with a quadratic energy in terms of a set of binary variables, in the past decade the method has been extended to deal with more general problem types, both in terms of variable types and energy functions, and has evolved to a general-purpose heuristic for combinatorial optimization.
Dynamics and Bifurcation in Neural Nets notes that the powerful qualitative and geometric tools of dynamical systems theory are most useful when the behavior of interest is stationary in the sense that the inputs are at most time or space periodic. It then shows how to analyze what kind of behavior we can expect over the long run for a given neural network. In ANNs, the final state may represent the recognition of an input pattern, the segmentation of an image, or any number of machine computations. The stationary states of biological neural networks may correspond to cognitive decisions (e.g., binding via synchronous oscillations) or to pathological behavior such as seizures and hallucinations. Another important issue that is addressed by dynamical systems theory is how the qualitative dynamics depends on parameters. The qualitative change of a dynamical system as a parameter is changed is the subject of bifurcation theory, which studies the appearance and disappearance of branches of solutions to a given set of equations as some parameters vary. This article shows how to use these techniques to understand how the behavior of neural nets depends on both the parameters and the initial states of the network. Phase-Plane Analysis of Neural Nets complements the study of bifurcations with a technique for studying the qualitative behavior of small systems of interacting neural networks whose neurons are, essentially, leaky integrator neurons. A complete analysis of such networks is impossible but when there are at most two variables involved, a fairly complete description can be given. The article introduces this qualitative theory of differential equations in the plane, analyzing two-neuron networks that consist of two excitatory cells, two inhibitory cells, or an excitatory and inhibitory cell. While planar systems may seem to be a rather extreme simplification, it is argued that in some local cortical circuits we can view the simple planar system as representing a population of coupled excitatory and inhibitory neurons. Computational methods are a very powerful adjunct to this type of analysis. The article concludes with comments on numerical methods and software.
Canonical Neural Models starts from the observation that various models of the same neural structure could produce different results. It thus shows how to derive results that can be observed in a class or a family of models. To exemplify the utility of considering families of neural models instead of a single model, the article shows how to reduce an entire family of Hodgkin-Huxley-type models to a canonical model. A model is canonical for a family if there is a continuous change of variables that transforms any other model from the family into this one. As an example, a canonical phase model is presented for a family of weakly coupled oscillators. The change of variables does not have to invertible, so the canonical model is usually lower-dimensional, simple, and tractable, and yet retains many important features of the family. For example, if the canonical model has multiple attractors, then each member of the family has multiple attractors.
Chaotic phenomena, in which a deterministic law generates complicated, nonperiodic, and unpredictable behavior, exist in many real-world systems and mathematical models. Chaos has many intriguing characteristics, such as sensitive dependence on initial conditions. Chaos in Biological Systems provides a view of the appearance of this phenomenon of “deterministic randomness” in a variety of models of physical and biological systems. Features used in assessing time series for chaotic behavior include the power spectrum, dimension, Lyapunov exponent, and Poincaré map. Examples are given from ion channels through cellular activity to complex networks, and “dynamical disease” is characterized by qualitative changes in dynamics in biological control systems. However, the high dimensions of biological systems and the environmental fluctuations that lead to nonstationarity make convincing demonstration of chaos in vivo (as opposed to computer models) a difficult matter. Chaos in Neural Systems looks at chaos in the dynamics of axons, neurons, and networks. An open issue is to understand the significance, if any, of observed fluctuations that appear chaotic. Does a neuron function well despite fluctuations in the timing between spikes, or are the irregularities essential to its task? And if the irregularities are essential to the task, is there any reason to expect that deterministic (chaotic) irregularities would be better than random ones? The vexing question of whether chaos adds functionality to neural networks is still open (see also “Synaptic Noise and Chaos in Vertebrate Neurons”). Stochastic Resonance is a nonlinear phenomenon whereby the addition of a random process, or “noise,” to a weak incoming signal can enhance the probability that it will be detected by a system. Information about the signal transmitted through the system is also enhanced. The information content or detectability of the signal is degraded for noise intensities that are either smaller or larger than some optimal value. Stochastic resonance has been demonstrated at several levels in biology, from ion channels in cell membranes to animal and human cognition, perception, and, ultimately, behavior.
Pattern Formation, Biological presents a general methodology, based on analysis of the largest eigenvalue, for tracing the asymptotic behavior of a dynamical system, and applies it to the problem of biological pattern formation. Turing originally considered the problem of how animal coat patterns develop. He suggested that chemical markers in the skin comprise a system of diffusion-coupled chemical reactions among substances called morphogens. Turing showed that in a two-component reaction-diffusion system, a state of uniform chemical concentration can undergo a diffusion-driven instability, leading to the formation of a spatially inhomogeneous state. In population biology, patchiness in population densities is the norm rather than the exception. In developmental biology, groups of previously identical cells follow different developmental pathways, depending on their position, to yield the rich spectrum of mammalian coat patterns and the patterns found in fishes, reptiles, mollusks, and butterflies. The article closes with a mechanical model of the process of angiogenesis (genesis of the blood supply) and network formation of endothelial cells in the extracellular matrix, as well as a new approach for predicting brain tumor growth. Pattern Formation, Neural shows that the Turing mechanism for spontaneous pattern formation plays an important role in studying two key questions on the large-scale functional and anatomical structure of cortex: How did the structure develop? What forms of spontaneous and stimulus-driven neural dynamics are generated by such a cortical structure? In the neural context, interactions are mediated not by molecular diffusion but by long-range axonal connections. This neural version of the Turing instability has been applied to many problems concerning the dynamics and development of cortex. In the former case, pattern formation occurs in neural activity; in the latter it occurs in synaptic weights. In most cases there exists some underlying symmetry in the model that plays a crucial role in the selection and stability of the resulting patterns.
Complementing this theme of pattern formation, Self-Organization and the Brain contrasts the algorithmic division of labor between programmer and computer in most current man-made computers with the view of the brain as a dynamical system in which ordered structures arise by processes of self-organization. It argues that, whereas the theory of self-organization has so far focused on the establishment of static structures, the nervous system is concerned with the generation of purposeful, nested processes evolving in time. However, if a self-organizing system is to create the appropriate patterns, quite a few control parameters in a system must all be put in the right ballpark. The article argues that, in view of the variability of the physiological state of the nervous system, evolution must have developed general mechanisms to actively and autonomously regulate its systems such as to produce interesting self-organized processes and states. The process of brain organization is seen as a cascade of steps, each one taking place within the boundary conditions established by the previous one, but the theory of such cascades is still nonexistent, posing massive challenges for future research. Cooperative Phenomena offers a related perspective, developing what has been a major theme in physics for the last century: statistical mechanics, which shows how, for example, to average out the individual variations in position and velocity of the myriad molecules in a gas to understand the relationship between pressure, volume, and temperature, or to see how variations in temperature can yield dramatic phase transitions, such as from ice to water or from water to steam. The article places these ideas in a general setting, stressing the notion of an order parameter (such as temperature in the previous example) that describes the macroscopic order of the system and whose variation can yield qualitative changes in system behavior. Unlike a control parameter, which is a quantity imposed on the system from the outside, an order parameter is established by the system itself via self-organization. The argument is mainly presented at a general level, but the article concludes by briefly examining cooperative phenomena in neuroscience, including pattern formation (see also Pattern Formation, Biological), EEG, MEG, movement coordination, and hallucinations (see also Pattern Formation, Neural).
Statistical Mechanics of Neural Networks introduces the reader to some of the basic methods of statistical mechanics and shows that they can be applied to systems made up of large numbers of (formal) neurons. Statistical mechanics has studied magnets as lattices with an atomic magnet (modeled as, e.g., a spin that can be up or down) at each lattice point, and this has led to the statistical analysis of neural networks as “spin glasses,” where firing and nonfiring correspond to “spin up” and “spin down,” respectively. It has also led to the study of “Markov Random Field Models in Image Processing,” in which the initial information at each lattice site represents some local features of the raw image, while the final state allows one to read off a processed image.
Collective Behavior of Coupled Phase Oscillators explains the use of phase models (here, the phase is the phase of an oscillation, not the type of phase whose transition is studied in statistical mechanics) to help understand how temporal coherence arises over populations of densely interconnected oscillators, even when their frequencies are randomly distributed. The phase oscillator model for neural populations exemplifies the idea that certain aspects of brain functions seem largely independent of the neurophysiological details of the individual neurons while trying to recover phase information, i.e., the kind of information encoded in the form of specific temporal structures of the sequence of neuronal spikings. The article reviews the collective behavior of coupled oscillators using the phase model and assuming all-to-all type interconnections. Despite this simplification, a great variety of collective behaviors is exhibited. Special attention is given to the onset and persistence of collective oscillation in frequency-distributed systems, splitting of the population into a few subgroups (clustering), and the more complex collective behavior called slow switching. Collections of oscillators that send signals to one another can phase lock, with many patterns of phase differences. Chains of Oscillators in Motor and Sensory Systems discusses a set of examples that illustrate how those phases emerge from the oscillator interactions. Much of the work was motivated by spatiotemporal patterns in networks of neurons that govern undulatory locomotion. The original experimental preparation to which this work was applied is the lamprey central pattern generator (CPG) for locomotion, but the mathematics is considerably more general. The article discusses several motor systems, then turns to the procerebral lobe of Limax, the common garden slug, to illustrate chains of oscillators in a sensory system. Since the details of the oscillators often are not known and difficult to obtain, the object of the mathematics is to find the consequences of what is known, and to generate sharper questions to motivate further experimentation.
Amplification, Attenuation, and Integration focuses on the computational role of the recurrent connections in networks of leaky integrator neurons. Setting the transfer function f(u) to be simply u in the network equations yields a linear network that can be completely analyzed using the tools of linear systems theory. The article describes the properties of linear networks and gives some examples of their application to neural modeling. In this framework, it is shown how recurrent synaptic connectivity can either attenuate or speed up responses; both effects can occur simultaneously in the same network. Besides amplification and attenuation, a linear network can also carry out temporal integration, in the sense of Newtonian calculus, when the strength of feedback is precisely tuned for an eigenmode, so that its gain and time constant diverge to infinity. Finally, it is noted that the linear computations of amplification, attenuation, and integration can be ascribed to a number of brain areas.
Winner-Take-All Networks presents a number of designs for neural networks that solve the following problem: given a number of networks, each of which provides as output some “confidence measure,” find in a distributed manner the network whose output is strongest. Two important variants of winner-take-all are k-winner-take-all, where the k largest inputs are identified, and softmax, which consists of assigning each input a weight so that all weights sum to 1 and the largest input receives the biggest weight. The article first describes softmax and shows how winner-take-all can be derived as a limiting case; it then describes how they can both be derived from probabilistic, or energy function, formulations; and it closes with a discussion of VLSI and biological mechanisms. “Modular and Hierarchical Learning Systems” addresses a somewhat related topic: Given a complex problem, find a set of networks, each of which provides an approximate solution in some region of the state space, together with a gating network that can combine these approximations to yield a globally satisfactory solution (i.e., blend the “good” solutions rather than extract the “best” solution).
Dynamics of Association and Recall uses dynamical studies to analyze the pattern recall process and its relation with the choice of initial state, the properties of stored patterns, noise level, and network architecture. For large networks and in global recall processes, the strategy is to derive dynamical laws at a macroscopic level (i.e., dependent on many neuron states). The challenge is to find the smallest set of macroscopic quantities which will obey closed deterministic equations in the limit of an infinitely large network. The article focuses on simple Hopfield-type models, but closes with a discussion of some variations and generalizations. Short-Term Memory asks: What are the different types of STM traces? How do intrinsic and synaptic mechanisms contribute to the formation of STM traces? How do STM traces translate into long-term memory representation of temporal sequences? The stress is on computational mechanisms underlying these processes with the suggestion that these mechanisms may well underlie a wide variety of seemingly different biological processes. The article examines both the short-term preservation of patterns of neural firing in a circuit and ways in which short-term maintained activity may be transferred into long-term memory traces.
Adaptive Resonance Theory
Associative Networks
Backpropagation: General Principles
Bayesian Methods and Neural Networks
Bayesian Networks
Competitive Learning
Convolutional Networks for Images, Speech, and Time Series
Data Clustering and Learning
Dynamics of Association and Recall
Ensemble Learning
Evolution and Learning in Neural Networks
Evolution of Artificial Neural Networks
Gaussian Processes
Generalization and Regularization in Nonlinear Learning Systems
Graphical Models: Parameter Learning
Graphical Models: Probabilistic Inference
Graphical Models: Structure Learning
Helmholtz Machines and Sleep-Wake Learning
Hidden Markov Models
Independent Component Analysis
Learning and Generalization: Theoretical Bounds
Learning Network Topology
Learning and Statistical Inference
Learning Vector Quantization
Minimum Description Length Analysis
Model Validation
Modular and Hierarchical Learning Systems
Neocognitron: A Model for Visual Pattern Recognition
Neuromanifolds and Information Geometry
Pattern Recognition
Perceptrons, Adalines, and Backpropagation
Principal Component Analysis
Radial Basis Function Networks
Recurrent Networks: Learning Algorithms
Reinforcement Learning
Self-Organizing Feature Maps
Simulated Annealing and Boltzmann Machines
Statistical Mechanics of Generalization
Statistical Mechanics of On-Line Learning and Generalization
Stochastic Approximations and Efficient Learning
Support Vector Machines
Temporal Pattern Processing
Temporal Sequences: Learning and Global Analysis
Universal Approximators
Unsupervised Learning with Global Objective Functions
Ying-Yang Learning
The majority of articles in this road map deal with learning in artificial neural networks. Nonetheless, the road map is titled “Learning in Artificial Networks” to emphasize the inclusion of a body of research on statistical inference and learning that can be seen either as generalizing neural networks or as analyzing other forms of networks, such as Bayesian networks and graphical models.
The fundamental difference between a system that learns and one that merely memorizes is that the learning system generalizes to unseen examples. Much of our concern is with supervised learning, getting a network to behave in a way that successfully approximates some specified pattern of behavior or input-output relationship. In particular, much emphasis has been placed on feedforward networks which have no loops, so that the output of the net depends on its input alone, since there is then no internal state defined by reverberating activity. The most direct form of this is a synaptic matrix, a one-layer neural network for which input lines directly drive the output neurons and a “supervised Hebbian” rule sets synapses so that the network will exhibit specified input-output pairs in its response repertoire. This is addressed in Associative Networks, which notes the problems that arise if the input patterns (the “keys” for associations) are not orthogonal vectors. Association also extends to recurrent networks, but in such systems of “dynamic memories” (e.g., Hopfield networks) there are no external inputs as such. Rather the “input” is the initial state of the network, and the “output” is the “attractor” or equilibrium state to which the network then settles. For neurons whose output is a sigmoid function of the linear combination of their inputs, the memory capacity of the associative memory is approximately 0.15n, where n is the number of neurons in the net. Unfortunately, such an “attractor network” memory model has many spurious memories, i.e., equilibria other than the memorized patterns, and there is no way to decide whether a recalled pattern was memorized or not. Dynamics of Association and Recall (see the road map Dynamic Systems for more details) shows how to move away from microscopic equations at the level of individual neurons to derive dynamical laws at a macroscopic level that characterize association and recall in Hopfield-type networks (with some discussion of variations and generalizations).
Historically, the earliest forms of supervised learning involved changing synaptic weights to oppose the error in a neuron with a binary output (the perceptron error-correction rule), or to minimize the sum of squares of errors of output neurons in a network with real-valued outputs (the Widrow-Hoff rule). This work is charted in Perceptrons, Adalines, and Backpropagation, which also charts the extension of these classic ideas to multilayered networks. In multilayered networks, there is the structural credit assignment problem: When an error is made at the output of a network, how is credit (or blame) to be assigned to neurons deep within the network? One of the most popular techniques is called backpropagation, whereby the error of output units is propagated back to yield estimates of how much a given “hidden unit” contributed to the output error. These estimates are used in the adjustment of synaptic weights to these units within the network. Backpropagation: General Principles places this idea in a broader framework by providing an overview of contributions that enrich our understanding of the pros and cons (such as “plateaus”) of this adaptive architecture. It also assesses the biological plausibility of backpropagation.
The underlying theoretical grounding is that, given any function f: X → Y for which X and Y are codable as input and output patterns of a neural network, then, as shown in Universal Approximators, f can be approximated arbitrarily well by a feedforward network with one layer of hidden units. The catch, of course, is that many, many hidden units may be required for a close fit. It is thus often treated as an empirical question whether there exists a sufficiently good approximation achievable in principle by a network of a given size—an approximation that a given learning rule may or may not find (it may, for example, get stuck in a local optimum rather than a global one). Gradient descent methods have also been extended to adapt the synaptic weights of recurrent networks. The backpropagation algorithm for feedforward networks has been successfully applied to a wide range of problems, but what can be implemented by a feedforward network is just a static mapping of the input vectors. However, to model dynamical functions of brains or machines, one must use a system capable of storing internal states and implementing complex dynamics. Recurrent Networks: Learning Algorithms presents, then, learning algorithms for recurrent neural networks that have feedback connections and time delays. In a recurrent network, the state of the system can be encoded in the activity pattern of the units, and a wide variety of dynamical behaviors can be programmed by the connection weights. A popular subclass of recurrent networks consists of those with symmetric connection weights. In this case, the network dynamics is guaranteed to converge to a minimum of some “energy” function (see “Energy Functionals for Neural Networks” and “Computing with Attractors”). However, steady-state solutions are only a limited portion of the capabilities of recurrent networks. For example, they can transform an input sequence into a distinct output sequence, and they can serve as a nonlinear filter, a nonlinear controller, or a finite-state machine. This article reviews the learning algorithms for training recurrent networks, with the main focus on supervised learning algorithms. (See “Recurrent Networks: Neurophysiological Modeling” for the use of such networks in modeling biological neural circuitry.)
One useful perspective for supervised learning views learning as hill-climbing in weight space, so that each “experience” adjusts the synaptic weights of the network to climb (or descend) a metaphorical hill for which “height” at a particular point in “weight space” corresponds to some measure of the performance of the network (or the organism or robot of which it is a part). When the aim is to minimize this measure, the learning process is then an example of what mathematicians call gradient descent. The term reinforcement comes from studies of animal learning in experimental psychology, where it refers to the occurrence of an event, in the proper relation to a response, that tends to increase the probability that the response will occur again in the same situation. Reinforcement Learning describes a form of “semisupervised” learning where the network is not provided with an explicit form of error at each time step but rather receives only generalized reinforcement (“you’re doing well”; “that was bad!”), which yields little immediate indication of how any neuron should change its behavior. Moreover, the reinforcement is intermittent, thus raising the temporal credit assignment problem (see also “Reinforcement Learning in Motor Control”): How is an action at one time to be credited for positive reinforcement at a later time? The solution is to build an “adaptive critic” that learns to evaluate actions of the network on the basis of how often they occur on a path leading to positive or negative reinforcement. Methods for this assessment of future expected reinforcement include temporal difference (TD) learning and Q-learning (see “Q-Learning for Robots”). Current reinforcement learning research includes parameterized function approximation methods; understanding how exploratory behavior is best introduced and controlled; learning under conditions in which the environment state cannot be fully observed; introducing various forms of abstraction such as temporally extended actions and hierarchy; and relating computational reinforcement learning theories to brain reward mechanisms (see “Dopamine, Roles of”).
The task par excellence for supervised learning is pattern recognition—the problem of classifying objects, often represented as vectors or as strings of symbols, into categories. Historically, the field of pattern recognition started with early efforts in neural networks (see Perceptrons, Adalines, and Backpropagation). While neural networks played a less central role in pattern recognition for some years, recent progress has made them the method of choice for many applications. As Pattern Recognition demonstrates, properly designed multilayer networks can learn complex mappings in high-dimensional spaces without requiring complicated hand-crafted feature extractors. To rely more on learning, and less on detailed engineering of feature extractors, it is crucial to tailor the network architecture to the task, incorporating prior knowledge to be able to learn complex tasks without requiring excessively large networks and training sets. Ensemble Learning describes algorithms that, rather than finding one best hypothesis to explain the data, construct a set (sometimes called a committee or ensemble) of hypotheses and then have those hypotheses vote to classify new patterns. Ensemble methods are often much more accurate than any single hypothesis. For example, the representational problem arises when the hypothesis space does not contain any hypotheses that are good approximations to the true decision function f. In some cases, a weighted sum of hypotheses expands the space of functions that can be represented. Hence, by taking a weighted vote of hypotheses, the learning algorithm may be able to form a more accurate approximation to f. The bulk of research into ensemble methods has focused on constructing ensembles of decision trees. The article introduces the techniques of bagging and boosting, among others, and analyzes their relative merits under different conditions.
Many specific architectures have been developed to solve particular types of learning problem. Adaptive Resonance Theory (ART) bases learning on internal expectations. A pattern matching process compares an external input with the internal memory of various coded patterns. ART matching leads either to a resonant state, which persists long enough to permit learning, or to a parallel memory search. If the search ends at an established code, the memory representation may either remain the same or incorporate new information from matched portions of the current input. When the external world fails to match an ART network’s expectations or predictions, a search process selects a new category, representing a new hypothesis about what is important in the present environment.
The neocognitron (see Neocognitron: A Model for Visual Pattern Recognition) was developed as a neural network model for visual pattern recognition that addresses the specific question, “How can a pattern be recognized despite variations in size and position?” by using a multilayer architecture in which local features are replicated in many different scales and locations. More generally, as shown in Convolutional Networks for Images, Speech, and Time Series, shift invariance in convolutional networks is obtained by forcing the replication of weight configurations across space. Moreover, the topology of the input is taken into account, enabling such networks to force the extraction of local features by restricting the receptive fields of hidden units to be local, and enforcing a built-in invariance with respect to translations, or local distortions of the inputs. The idea of connecting units to local receptive fields on the input goes back to the perceptron in the early 1960s, and was almost simultaneous with Hubel and Wiesel’s discovery of locally sensitive, orientation-selective neurons in the cat’s visual system.
Just as a polynomial of too high a degree is not useful for curve fitting, a network that is too large will fail to generalize well, and will require longer training times. Smaller networks, with fewer free parameters, enforce a smoothness constraint on the function found. For best performance, it is therefore desirable to find the smallest network that will “fit” the training data. To create a neural network, a designer typically fixes a network topology and uses training data to tune its parameters such as connection weights. The designer, however, often does not have enough knowledge to specify the ideal topology. It is thus desirable to learn the topology from training data as well. Learning Network Topology reviews algorithms that adjust network topology, adding neurons and removing neurons during the learning process, to arrive at a network appropriate to a given task. For topology learning, a bias is added to prefer smaller models. It is often found that this bias produces a neural network that has better generalization and is more interpretable. This framework is applied to learning the topologies of both feedforward neural networks and Bayesian Networks. In Bayesian networks, all the nodes of the network are given and set, and one searches for a topology by adding or deleting links.
Many articles in the Handbook emphasize situations where, e.g., learning from examples is stochastic in the sense that examples are randomly generated and the network behavior is thus to be analyzed from a statistical point of view. Statistical estimation identifies the mechanism underlying stochastic phenomena. Learning and Statistical Inference studies learning by using such statistical notions as Fisher information, Bayesian loss, and sequential estimation, as well as the Expectation-Maximization (EM) algorithm for estimating hidden variables. Nonlinear neurodynamics, learning, and self-organization are seen as adding new concepts to statistical science. The article examines the dynamical behaviors of a learning network under a general loss criterion. The behavior of learning curves is related to neural network complexity to elucidate the discrepancy between training and generalization errors. This perspective is further developed in Neuromanifolds and Information Geometry. A neural network is specified by its architecture and a number of parameters such as synaptic weights and thresholds. Any neural network of this architecture is specified by a point in the parameter space. Learning takes place in the parameter space and a learning process is represented by a trajectory. The article presents the approach of information geometry which sees the geometrical structure of the parameter space as given by a Riemannian manifold. The article shows how dynamical behaviors of neural learning on these “neuromanifolds” are related to the underlying geometrical structures, using multilayer perceptrons and Boltzmann machines as examples.
Generalization and Regularization in Nonlinear Learning Systems sets forth the essential relationship between multivariate function estimation in a statistical context and supervised machine learning. Given a training set consisting of (input, output) pairs (xi, yi), the task is to construct a map that generalizes well in that, given a new value of x, the map will provide a reasonable prediction for the hitherto unobserved output associated with this x. Regularization simplifies the problem by applying constraints to the construction of the map that reduce the generalization error (see also “Probabilistic Regularization Methods for Low-Level Vision”). Ideally, these constraints embody a priori information concerning the true relationship between input and output, though various ad hoc constraints have sometimes been shown to work well in practice. Feedforward neural nets, radial basis functions, and various forms of splines all provide regularized or regularizable methods for estimating “smooth” functions of several variables from a given training set. Which method to use depends on the particular nature of the underlying but unknown “truth,” the nature of any prior information that might be available about this “truth,” the nature of any noise in the data, the ability of the experimenter to choose the various smoothing or regularization parameters well, and so on.
Modular and Hierarchical Learning Systems solves a complex learning problem by dividing it into a set of subproblems. In the context of supervised learning, modular architectures arise when the data can be described by a collection of functions, each of which works well over a relatively local region of the input space, allocating different modules to different regions of the space. The challenge is that, in general, the learner is not provided with prior knowledge of the partitioning of the input space. To solve this, a “gating network” can learn which module to “listen to” in different situations. The learning algorithms described here solve the credit assignment problem by computing a set of posterior probabilities that can be thought of as estimating the utility of different modules in different parts of the input space. An EM algorithm (cf. Learning and Statistical Inference), an alternative to gradient methods, can be derived for estimating the parameters of both the modular system and its extension to hierarchical architectures. The latter arise when we assume that the data are well described by a multiresolution model in which regions are divided recursively into subregions.
Bayesian Methods and Neural Networks shows how to apply Bayes’s rule for the use of probabilities to quantify inferences about hypotheses from given data. The idea is to take the predictions p(d|hi) made by alternative models hi about data d, and the prior probabilities of the models p(hi), and obtain the posterior probabilities p(hi|d) of the models given the data, using Bayes’s rule in the form p(hi|d) = p(d|hi)p(hi)/p(d). To apply this to neural networks, regard a supervised neural network as a nonlinear parameterized mapping from an input x to an output y = y(x; w), which depends continuously on the “weights” parameter w. The idea is to choose w from a weight space with some given probability distribution p(w) so as to maximize the likelihood of the nets yielding the given set of (input, output) observations. The Bayesian framework deals with uncertainty in a natural, consistent manner by combining prior beliefs about which models are appropriate with how likely each model would be to have generated the data. This results in an elegant, general framework for fitting models to data that, however, may be compromised by computational difficulties in carrying out the ideal procedure. There are many approximate Bayesian implementations, using methods such as sampling, perturbation techniques, and variational methods. In the case of models linear in their parameters, Bayesian neural networks are closely related to Gaussian Processes (q.v.), where many of the computational difficulties of dealing with more general stochastic nonlinear systems can be avoided. Traditionally, neural networks are graphical representations of functions in which the computations at each node are deterministic. By contrast, networks in which nodes represent stochastic variables are called graphical models (see Bayesian Networks and Graphical Models: Probabilistic Inference).
Radial Basis Function Networks applies Bayesian methods to the case where the approximation to the given y = y(x; w) is based on a network using combinations of “radial basis” functions, each of which is “centered” around a weight vector w, so that the response to input x depends on some measure of “distance” of x from w, rather than on the dot product w · x = wixi as in many formal neurons. The distribution of the w’s may be determined by some form of clustering (see Data Clustering and Learning). Further learning adjusts the connection strengths to a neuron whose outputs give an estimate of, e.g., the posterior probability p(c|x) that class c is present given the observation (network input) x. However, it is easier to model other related aspects of the data, such as the unconditional distribution of the data p(x) and the likelihood of the data, p(x|c), and then recreate the posterior from these quantities according to Bayes’s rule, p(ci|x) = p(ci)p(x|ci)/p(x).
Gaussian Processes continues the Bayesian approach to neural network learning, placing a prior probability distribution over possible functions and then letting the observed data “sculpt” this prior into a posterior using the available data. One can place a prior distribution P(w) on the weights w of a neural network to induce a prior over functions P(y(x;w)) but the computations required to make predictions are not easy, owing to nonlinearities in the system. A Gaussian process, defined by a mean function and covariance matrix, can prove useful as a way of specifying a prior directly over function space—it is often simpler to do this than to work with priors over parameters. Gaussian processes are probably the simplest kind of function space prior that one can consider, being a generalization of finite-dimensional Gaussian distributions over vectors. A Gaussian process is defined by a mean function (which we shall usually take to be identically zero), and a covariance function C(x, x′) which indicates how correlated the value of the function y is at x and x′. This function encodes our assumptions about the problem (e.g., that the function is smooth and continuous) and will influence the quality of the predictions. The article shows how to use Gaussian processes for classification problems, and describes how data can be used to adapt the covariance function to the given prediction problem.
Minimum Description Length Analysis shows how ideas relating to minimum description length (MDL) have been applied to neural networks, emphasizing the direct relationship between MDL and Bayesian model selection methods. The classic MDL approach defined the information in a binary string to be the length of the shortest program with which a general-purpose computer could generate the string. The Bayes bridge is obtained by replacing the Bayesian goal of inferring the “most likely” model M from a set of observations by minimizing the length of an encoded message which describe M as well as the data D expressed in term of M. MDL and Bayesian methods both formalize Occam’s razor in that a complex network is preferred only if its predictions are sufficiently more accurate.
Unsupervised Learning with Global Objective Functions makes the point that even unsupervised learning involves an implicit training signal based on the network’s ability to predict its own input, or on some more general measure of the quality of its internal representation. The main problem in unsupervised learning research is then seen as the formulation of a performance measure or cost function for the learning to generate this internal supervisory signal. The cost function is also known as an objective function, since it sets the objective for the learning process. The article reviews three types of unsupervised neural network learning procedures: information-preserving algorithms, density estimation techniques, and invariance-based learning procedures. The first method is based on the preservation of mutual information Ix;y = H(x) − H(x|y) between the input vector x and output vector y, where H(x) is the entropy of random variable x and H(x|y) is the entropy of the conditional distribution of x given y. The second approach is to assume a priori a class of models that constrains the general form of the probability density function and then to search for the particular model parameters defining the density function (or mixture of density functions) most likely to have generated the observed data (cf. the earlier discussion of Bayesian methods). Finally, invariance-based learning extracts higher-order features and builds more abstract representations. Once again, the approach is to make constraining assumptions about the structure that is being sought, and to build these constraints into the network’s architecture and/or objective function to develop more efficient, specialized learning procedures.
The Bayesian articles stress the “global” statistical idea of “find the weights which, according to given probability distributions maximize some expectation” as distinct from the deterministic idea of adjusting the weights at each time step to provide a local increment in performance on the current input. However, gradient descent provides an important tool for finding the weight settings which decrease some stochastic expectation of error, too. Stochastic Approximations and Efficient Learning shows that gradient descent has a long tradition in the literature of stochastic approximation. Any stochastic process that can be interpreted as minimizing a cost function based on noisy gradient measurements in a sequential, recursive manner may be considered to be a stochastic approximation. “Sequential” means that each estimate of the location of a minimum is used to make a new observation, which in turn immediately leads to a new estimate; “recursive” means that the estimates depend on past gradient measurements only through a fixed number of scalar statistics. Such on-line algorithms are useful because they enjoy significant performance advantages for large-scale learning problems. The article describes their properties using stochastic approximation theory as a very broad framework, and provides a brief overview of newer insights obtained using information geometry (see Neuromanifolds and Information Geometry) and replica calculations (see Statistical Mechanics of On-Line Learning and Generalization).
In order to understand the performance of learning machines, and to gain insight that helps to design better ones, it is helpful to have theoretical bounds on the generalization ability of the machines. The determination of such bounds is the subject of Learning and Generalization: Theoretical Bounds. Here it is necessary to formalize the learning problem and turn the question of how well a machine generalizes into a mathematical question. The article adopts the formalization used in statistical learning theory, which is shown to include both pattern recognition and function learning. The road map Computability and Complexity gives more information on this and related articles, such as “PAC Learning and Neural Networks” and “Vapnik-Chervonenkis Dimension of Neural Networks,” which offer bounds on the performance of learning methods. Support Vector Machines addresses the (binary) pattern recognition problem of learning theory: given two classes of objects, to assign a new object to one of the two classes. Trying to find the best classifier involves notions of similarity in the set X of inputs. Support vector machines (SVMs) build a decision function as a kernel expansion corresponding to a separating hyperplane in a feature space. SVMs rest on methods for the selection of the patterns on which the kernels are centered and in the choice of weights that are placed on the individual kernels in the decision function. SVMs and other kernel methods have a number of advantages compared to classical neural network approaches, such as the absence of spurious local minima in the optimization procedure, the need to tune only a few parameters, and modularity in the design. Kernel methods connect similarity measures, nonlinearities, and data representations in linear spaces where simple geometric algorithms are performed.
The passage of the “energy” of a Hopfield network to a local minimum can be construed as a means for solving an optimization problem. The catch is the word “local” in local minimum—the solution may be the best in the neighborhood, yet far better solutions may be located elsewhere. One resolution of this is described in Simulated Annealing and Boltzmann Machines. At the expense of great increases in time to convergence, simulated annealing escapes local minima by adding noise, which is then gradually reduced (“lowering the temperature”). The initially high temperature (i.e., noise level) stops the system from getting trapped in “high valleys” of the energy landscape, the lowering of temperature allows optimization to occur in the “deepest valley” once it has been found. The Boltzmann machine then applies this method to design a class of neural networks. These machines use stochastic computing elements to extend discrete Hopfield networks in two ways: they replace the deterministic, asynchronous dynamics of Hopfield networks with a randomized local search dynamics, and they replace the Hebbian learning rule with a more powerful stochastic learning algorithm.
Turning from neural networks to another form of network structure, Bayesian Networks (as distinct from Bayesian Methods and Neural Networks) provides an explicit method for following chains of probabilistic inference such as those appropriate to expert systems, extending the Bayes’s rule for updating probabilities in the light of new evidence. The nodes in a Bayesian network represent propositional variables of interest and the links represent informational or causal dependencies among the variables. The dependencies are quantified by conditional probabilities for each node, given its parents in the network. The network supports the computation of the probabilities of any subset of variables, given evidence about any other subset, and the reasoning processes can operate on Bayesian networks by propagating information in any direction. Helmholtz Machines and Sleep-Wake Learning starts by observing that since unsupervised learning is largely concerned with finding structure among sets of input patterns, it is important to take advantage of cases in which the input patterns are generated in a systematic way, thus forming a manifold that has many fewer dimensions than the space of all possible activation patterns. The Helmholtz machine is an analysis-by-synthesis model. The key idea is to have an imperfect generative model train a better analysis or recognition model, and an imperfect recognition model train a better generative model. The generative model for the Helmholtz machine is a structured belief network (i.e., Bayesian network) that is viewed as a model for hierarchical top-down connections in the cortex. New inputs are analyzed in an approximate fashion using a second structured belief network (called the recognition model), which is viewed as a model for the standard, bottom-up connections in cortex. The generative and recognition models are learned from data in two phases. In the wake phase, the recognition model is used to estimate the underlying generators for a particular input pattern, and then the generative model is altered so that those generators are more likely to have produced the input that is actually observed. In the sleep phase, the generative model fantasizes inputs by choosing particular generators stochastically, and then the recognition model is altered so that it is more likely to report those particular generators if the fantasized input were actually to be observed. Ying-Yang Learning further develops this notion of simultaneously building up two pathways, a bottom-up pathway for encoding a pattern in the observation space into its representation in a representation space, and a top-down pathway for decoding or reconstructing a pattern from an inner representation back to a pattern in the observation space. The theory of Bayesian Ying-Yang harmony learning formulates the two-pathway approach in a general statistical framework, modeling the two pathways via two complementary Bayesian representations of the joint distribution on the observation space and representation space. The article shows how a number of major learning problems and methods can be seen as special cases of this unified perspective. Moreover, the ability of Ying-Yang learning for regularization and model selection is placed in an information-theoretic perspective.
Graphical Models: Probabilistic Inference introduces a generalization of Bayesian networks. The graphical models framework provides a clean mathematical formalism that has made it possible to understand the relationships among a wide variety of network-based approaches to computation, and in particular to understand many neural network algorithms and architectures as instances of a broader probabilistic methodology. Graphical models use graphs to represent and manipulate joint probability distributions. The graph underlying a graphical model may be directed, in which case the model is often referred to as a belief network or a Bayesian network, or the graph may be undirected, in which case the model is generally referred to as a Markov random field. A graphical model has both a structural component, encoded by the pattern of edges in the graph, and a parametric component, encoded by numerical “potentials” associated with sets of edges in the graph. General inference algorithms allow statistical quantities (such as likelihoods and conditional probabilities) and information-theoretic quantities (such as mutual information and conditional entropies) to be computed efficiently. The article closes by noting that many neural network architectures are special cases of the general graphical model formalism, both representationally and algorithmically. Special cases of graphical models include essentially all models of unsupervised learning, as well as Boltzmann machines, mixtures of experts, and radial basis function networks, while many other neural networks, including the classical multilayer perceptron, can be profitably analyzed from the viewpoint of graphical models. The next two articles present learning algorithms that build on these inference algorithms and allow parameters and structures to be estimated from data. Graphical Models: Parameter Learning discusses the learning of parameters for a fixed graphical model. As noted, each node in the graph represents a random variable, while the edges in the graph represent the qualitative dependencies between the variables; the absence of an edge between two nodes means that any statistical dependency between these two variables is mediated via some other variable or set of variables. The quantitative dependencies between variables that are connected via edges are specified via parameterized conditional distributions, or more generally nonnegative “potential functions.” The pattern of edges is the structure of the graph, while the parameters of the potential functions are parameters of the graph. The present article assumes that the structure of the graph is given, and shows how to then learn the parameters of the graph from data. Graphical Models: Structure Learning turns to the simultaneous learning of parameters and structure. Real-world applications of such learning abound, the example presented being an analysis of data regarding factors that influence the intention of high school students to attend college. For simplicity, the article focuses on directed-acyclic graphical models, but the basic principles thus defined can be applied more generally. The Bayesian approach is emphasized, and then several common non-Bayesian approaches are mentioned briefly.
Competitive Learning is a form of unsupervised learning in which each input pattern comes, through learning, to be associated with the activity of one or at most a few neurons, leading to sparse representations of data that are easy to decode. Competitive learning algorithms employ some sort of competition between neurons in the same layer via lateral connections. This competition limits the set of neurons to be affected in a given learning trial. Hard competition allows the final activity of only one neuron, the strongest one to start with, whereas in soft competition the activity of the lateral neurons does not necessarily drive all but one to zero. One form of competitive learning algorithm can be described as an application of a successful single-neuron learning algorithm in a network with lateral connections between adjacent neurons. The lateral connections are needed so that each neuron can be inhibited from adapting to a feature of the data already captured by other neurons. A second family of algorithms uses the competition between neurons for improving, sharpening, or even forming the features extracted from the data by each single neuron. Data Clustering and Learning emphasizes the related idea of data clustering, discovering, and emphasizing structure that is hidden in a data set (e.g., the pronounced similarity of groups of data vectors) in an unsupervised fashion. There is a delicate trade-off: not to superimpose too much structure, and yet not to overlook structure. The choice of data representation predetermines what kind of cluster structures can be discovered in the data. Formulating the search for clusters as an optimization problem then supports validation of clustering results by checking that the cluster structures found in a data set vary little from one data set to a second data set generated by the same data source. The two tasks of clustering, density estimation and data compression, are tightly related by the fact that the correct identification of the probability model of the source yields the best code for data compression. Principal Component Analysis shows how, in data compression applications like image or speech coding, a distribution of input vectors may be economically encoded, with small expected values of the distortions, in terms of eigenvectors of the largest eigenvalues of the correlation matrix that describes the distribution of these patterns (these eigenvectors are the “principal components”). However, it is usually not possible to find the eigenvectors on-line. The ideal solution is then replaced by a neural network learning rule embodying a constrained optimization problem that converges to the solution given by the principal components. Independent Component Analysis (ICA) is a linear transform of multivariate data designed to make components of the resulting random vector as statistically independent (factorial) as possible. In signal processing it is used to attack the problem of the blind separation of sources, for example of audio signals that have been mixed together by an unknown process (the “cocktail party effect”). In the area of neural networks and brain theory, it is an example of an information-theoretic unsupervised learning algorithm. When an ICA network is trained on an ensemble of natural images, it learns localized-oriented receptive fields qualitatively similar to those found in area V1 of mammalian visual cortex. ICA has been used to decompose multivariate brain data into components that help us understand task-related spatial and temporal brain dynamics. Thus the same neural network algorithm is being used both as an explanation of brain properties and as a method of probing the brain. Where principal component analysis (PCA) uses second-order statistics (the covariance matrix) to remove correlations between the elements of a vector, ICA uses statistics of all orders. PCA attempts to decorrelate the outputs, while ICA attempts to make the outputs statistically independent. The most widely used adaptive, on-line method for ICA is also the most “neural-network-like” and is the one described in the body of this article.
Self-Organizing Feature Maps introduces the self-organizing feature map (SOFM or SOM; also known as a Kohonen map), a nonlinear method by which features can be obtained with an unsupervised learning process. It is based on a layer of adaptive “neurons” that gradually develops into an array of feature detectors. The linking of input signals to response locations in the map can be viewed as a nonlinear projection from a signal or input space to the (usually) 2D map layer. The learning method is an augmented Hebbian method in which learning by the element most responsive to an input pattern is “shared” with its neighbors. The result is that the resulting “compressed image” of the (usually higher-dimensional) input space has the property of a topographic map that reflects important metric and statistical properties of the input signal distribution: distance relationships in the input space (expressing, e.g., pattern similarities) are approximately preserved as distance relationships between corresponding excitation sites in the map, and clusters of similar input patterns tend to become mapped to areas of the neural array whose size varies in proportion to the frequency of the occurrence of their patterns. This resembles in many ways the structure of topographic feature maps found in many brain areas, for which the SOFM offers a neural model that bridges the gap between microscopic adaptation rules postulated at the single neuron or synapse level and the formation of experimentally better accessible, macroscopic patterns of feature selectivity in neural layers. From a statistical point of view, the SOFM provides a nonlinear generalization of principal component analysis and has proved valuable in many application contexts.
In order to give a quantitative answer to the question of how well the trained network will be able to classify an input that it has not seen before, it is common to assume that all inputs, both from the training set and the test set, are produced independently and at random. Clearly, the generalization error depends on the specific algorithm that was used during the training, and its calculation requires knowledge of the network weights generated by the learning process. In general, these weights will be complicated functions of the examples, and an explicit form will not be available in most cases. The methods of statistical mechanics provide an approach to this problem, which often enables an exact calculation of learning curves in the limit of a very large network. In the statistical mechanics approach one studies the ensemble of all networks that implement the same set of input/output examples to a given accuracy. In this way the typical generalization behavior of a neural network (in contrast to the worst or optimal behavior) can be described. We thus turn to two articles that apply the methods introduced in the article “Statistical Mechanics of Neural Networks”: Statistical Mechanics of On-Line Learning and Generalization emphasizes on-line learning in which training examples are dealt with one at a time, whileStatistical Mechanics of Generalization emphasizes off-line or memory-based methods, where learning is guided by the minimization of a cost function as averaged over the whole training set. From a statistical physics point of view, the distinction is between systems that can be thought of as being in a state of thermal equilibrium (off-line ≈ on-equilibrium) and away-from-equilibrium situations where the network is not allowed to extract all possible information from a set of examples (on-line ≈ off-equilibrium). While on-line learning is an intrinsically stochastic process, the restriction to large networks, together with assumptions about the statistical properties of the inputs, permits a concise description of the dynamics in terms of coupled ordinary differential equations. These deterministic equations govern the average evolution of quantities that completely define the macroscopic state of the ANN. The average is taken with respect to the data, which is straightforward if the presented examples are statistically independent. The probability that the network will make a mistake on the new input defines its generalization error for a given training set. Its average over many realizations of the training set, as a function of the number of examples, gives the so-called learning curve. Calculation of the learning curve requires knowledge of the network weights generated by the learning process, for which an explicit form will not be available in most cases. The methods of statistical mechanics provide an approach to this problem, in many cases yielding an exact calculation of learning curves in the “thermodynamic limit” of a very large network in which the network size increases in proportion to the number of training examples, while the statistical or information-theoretic approach is applicable to the learning curve of a medium-size network (cf. Learning and Statistical Inference).
Model Validation shows how the data analyst tries to infer a “model” that summarizes functional dependencies that may be observed in a given set of empirical data. A good model fit should reproduce the behavior of the studied system in the parameter range to be explained by the model study. Model complexity has to be controlled to avoid both missing essential features of the system (underfitting) and adapting to irrelevant fluctuations in the data (overfitting). Model validation provides the crucial step in modeling between model synthesis and analysis, assessing how appropriate the model is to gain insight into the real-world system. Model validation can make use of bounds of the VC type (cf. “Vapnik-Chervonenkis Dimension of Neural Networks”), which usually contain a complexity term that accounts for the flexibility of the hypothesis class and a fitting term that measures the contraction of measure due to the large number of samples. It is shown how these terms can be controlled either by numerical methods like cross-validation and bootstrap or by analytical techniques from computational learning theory. The trade-off between model complexity and goodness of fit and its relation to the computational complexity of learning remains a deep challenge for research.
Hidden Markov Models describes the use of deterministic and stochastic finite state automata for sequence processing, with special attention to hidden Markov models as tools for the processing of complex piecewise stationary sequences. It also describes a few applications of ANNs to further improve these methods. HMMs allow complex sequential learning problems to be solved by assuming that the sequential pattern can be decomposed into piecewise stationary segments, with each stationary segment parameterized in terms of a stochastic function. The HMM is called “hidden” because there is an underlying stochastic process (i.e., the sequence of states) that is not observable but that affects the observed sequence of events.
Temporal Pattern Processing notes that time is embodied in a temporal pattern in two different ways: the temporal order among the components of a sequence and the temporal duration of the elements (see also “Sequence Learning”). A sequence is defined as complex if it contains repetitions of the same subsequence, and otherwise is simple. For the generation of complex sequences, the correct successor can be determined only by knowing components prior to the current one. We refer to the prior subsequence required to determine the current component as the context of the component. Temporal processing requires that a neural network have a capacity of short-term memory (STM) in order to maintain a component for some time. Time warping is challenging because we would like to have invariance over limited warping, but dramatic change in relative duration must be recognized differently. Another fundamental ability of human information processing is chunking, which, in the context of temporal processing, means that frequently encountered and meaningful subsequences organize into chunks that form basic units for further chunking at a higher level. Temporal Sequences: Learning and Global Analysis studies how elementary pattern sequences may be represented in neural structures at a low architectural and computational cost, seeking to understand mechanisms to memorize spatiotemporal associations in a robust fashion within model neural networks. The article focuses on formal neural networks where the interplay between neural and synaptic dynamics and, in particular, the role of transmission delays can be analyzed using methods from nonlinear dynamics and statistical mechanics. Among the questions studied are how to train a network so that its limit cycles will resemble taught sequences. Such simplified systems are necessarily caricatures of biological structures yet suggest aspects that are important for more elaborate approaches to real neural systems.
Evolution of Artificial Neural Networks adds another temporal dimension to the biological process of adaptation, namely, that of evolution. Rather than adapt the weights of a single network to solve a problem in the network’s “lifetime,” the evolutionary approach applies the methodology of genetic algorithms to evolve a population of neural networks over several generations so that the population becomes better and better suited to some computational ecology. Evolution and Learning in Neural Networks extends this selection of networks on the basis of the result of their adaptation to the environment through lifetime learning. The article shows how studies of ANNs that are subjected both to an evolutionary and a lifetime learning process have been conducted to look at the advantages, in terms of performance, of combining two different adaptation techniques or to help understand the role of the interaction between learning and evolution in natural organisms.
Analog Neural Networks, Computational Power
Learning and Generalization: Theoretical Bounds
Neural Automata and Analog Computational Complexity
PAC Learning and Neural Networks
Universal Approximators
Vapnik-Chervonenkis Dimension of Neural Nets
The 1930s saw the definition of an abstract notion of computability when it was discovered that the set of functions on the natural numbers, f: ℕ → ℕ, computable by a Turing machine (an abstraction from following a finite set of rules to calculate on a finite but extendible tape, each square of which could hold one of a fixed set of symbols), lambda functions (which later came to be better known as functions computable by programs written in LISP), and general recursive functions (a class of functions obtained from very simple numerical functions by repeated application of composition, minimization, etc.), were identical. As general-purpose electronic computers were developed and used in the 1940s and 1950s, it was firmly established that these computable functions were precisely the functions that could be computed by such computers with suitable programs, provided there were no limitations on computer memory or computation time. This set the stage for the development of complexity theory in the 1960s and beyond: to chart the different subsets of the computable functions that would be obtained when restrictions were placed on computing resources.
Many classification or pattern recognition tasks can be formulated as mappings between subsets of multidimensional vector spaces by using a suitable coding of inputs and outputs, and many types of feedforward networks are universal in the sense that, given enough adjustable synaptic weights, they can approximate any mapping between subsets of Euclidean spaces. Universal Approximators surveys recent developments in the mathematical theory of feedforward networks and includes proofs of the universal approximation capabilities of perceptron and radial basis function networks with general activation and radial functions, and provides estimates of rates of approximation. The article also characterizes sets of multivariable functions that can be approximated without the “curse of dimensionality,” which is an exponentially fast scaling of the number of parameters with the number of variables.
Neural Automata and Analog Computational Complexity explores what happens when the discrete operations of conventional automata theory are replaced by a computing model in which operations on real numbers are treated as basic. Whereas classical automata describe digital machines, neural models frequently require a framework of analog computation defined on a continuous phase space, with a dynamics characterized by the existence of real constants that influence the macroscopic behavior of the system. Moreover, unlike the flow in digital computation, analog models do not include local discontinuities. Neural networks with real weights are more powerful than traditional models of computation in that they can compute more functions within given time bounds. However, the practicality of an approach based on infinite precision real operations remains to be seen. Nonetheless, the new attention to real numbers has renewed complexity theory and introduced many open problems in computational learning theory and neural network theory. The article thus pays special attention to analog computation in the presence of noise. Analog Neural Networks, Computational Power then analyzes the exact and approximate representational power of feedforward and recurrent neural nets with synchronous update, with a brief discussion of networks of spiking neurons and their relation to sigmoidal nets. Learning complexity increases with increasing representational power of the underlying neural model and care has to be exercised to strike a balance between representational power on the one hand and learning complexity on the other. However, the emphasis of the article is on representational power, i.e., on what can be represented with networks using a given set of activation functions, rather than on learning complexity. Splines (i.e., piecewise polynomial functions) have turned out to be powerful approximators, and they are used here as the benchmark class of activation functions. Much attention is given to studying the properties that a class of activation functions needs to reach the approximation power of splines.
PAC Learning and Neural Networks discusses the “probably approximately correct” (PAC) learning paradigm as it applies to ANNs. Roughly speaking, if a large enough sample of randomly drawn training examples is presented, then it should be likely that, after learning, the neural network will classify most other randomly drawn examples correctly. The PAC model formalizes the terms “likely” and “most.” The two main issues in PAC learning theory are how many training examples should be presented, and whether learning can be achieved using a fast algorithm. These are known, respectively, as the sample complexity and computational complexity problems. PAC learning makes use of the Vapnik-Chervonenkis dimension (VC-dimension) as a combinatorial parameter that measures the “expressive power” of a family of functions. This parameter is described more fully in Vapnik-Chervonenkis Dimension of Neural Nets. Bounds for the VC-dimension of a neural net N provide estimates for the number of random examples that are needed to train N so that it has good generalization properties (i.e., so that the error of N on new examples from the same distribution is very small, with probability very close to 1). Typically, the VC-dimension for a class of networks grows polynomially (in many cases, between linearly and quadratically) with the number of adjustable parameters of the neural network. In particular, if the number of training examples is large compared to the VC-dimension, the network’s performance on training data is a reliable indication of its future performance on subsequent data. The bounds on training set size tend to be large, since they provide generalization guarantees simultaneously for any probability distribution on the examples and for any training algorithm that minimizes disagreement on the training examples. Tighter bounds are available for some special distributions and specific training algorithms. This theme is further developed in Learning and Generalization: Theoretical Bounds in relation to three learning problems: pattern recognition, regression estimation, and density estimation. Because of the looseness of its bounds as well as the difficulty of evaluating them, VC theory was until recently largely neglected by practitioners. This has changed markedly with the development of support vector machines. Using nonlinear similarity measures, referred to as kernels, one can reduce a large class of learning algorithms to linear algorithms in an associated feature space. For the linear algorithms, a VC analysis can be carried out, identifying precisely the factors that need to be controlled to achieve high generalization ability in a variety of learning tasks. “Support Vector Machines” casts these factors into a convex optimization framework, leading to efficient and mathematically well-founded algorithms that have been shown to produce state-of-the-art results on a large variety of problems.
Adaptive Resonance Theory
Collicular Visuomotor Transformations for Gaze Control
Color Perception
Contour and Surface Perception
Cortical Population Dynamics and Psychophysics
Directional Selectivity
Dissociations Between Visual Processing Modes
Dynamic Link Architecture
Dynamic Remapping
Face Recognition: Neurophysiology and Neural Technology
Face Recognition: Psychology and Connectionism
Fast Visual Processing
Feature Analysis
Gabor Wavelets and Statistical Pattern Recognition
Global Visual Pattern Extraction
Imaging the Visual Brain
Information Theory and Visual Plasticity
Kalman Filtering: Neural Implications
Laminar Cortical Architecture in Visual Perception
Markov Random Field Models in Image Processing
Motion Perception, Elementary Mechanisms
Motion Perception: Navigation
Neocognitron: A Model for Visual Pattern Recognition
Object Recognition
Object Recognition, Neurophysiology
Object Structure, Visual Processing
Ocular Dominance and Orientation Columns
Orientation Selectivity
Perception of Three-Dimensional Structure
Probabilistic Regularization Methods for Low-Level Vision
Pursuit Eye Movements
Retina
Sensor Fusion
Stereo Correspondence
Synchronization, Binding and Expectancy
Tensor Voting and Visual Segmentation
Visual Attention
Visual Cortex: Anatomical Structure and Models of Function
Visual Scene Perception, Neurophysiology
Visual Scene Segmentation
The topic of Vision has provided one of the most fertile fields of investigation both for brain theorists and for technologists constructing ANNs. Six articles in the road map Mammalian Brain Regions—Retina, Collicular Visuomotor Transformations for Gaze Control, Thalamus, Visual Cortex: Anatomical Structure and Models of Function, and Visual Scene Perception, Neurophysiology—introduce various brain regions associated with vision. It is important to emphasize the role of “active vision” in gaining information relevant for animals and robots considered as real-time perception-action systems. This is a theme that is further developed in the road maps Neuroethology and Evolution and Mammalian Motor Control. Nonetheless, many articles in the present road map will analyze vision as the process of discovering from images what is present in the world: we may see active vision as more like the mode of vision employed by the “where/how” system described in Visual Scene Perception, Neurophysiology, whereas “passive” vision may be closer to the role of the “what” pathway. Dissociations Between Visual Processing Modes explores the notion that the visual system has two kinds of jobs to do. One is to support visual cognition, the other is to drive visually guided behavior. Qualitative information about location may be adequate for cognition, but the sensorimotor function needs quantitative egocentrically calibrated spatial information to guide motor acts. The article reviews evidence from neurophysiology, neurological analysis of patients, and psychophysics that the two systems should be modeled as separate maps of visual space rather than as a single visual representation with two readouts. Moreover, spatial information can flow from the cognitive to the sensorimotor representation, but not in the other direction.
However, even “passive” vision is not so passive, since attentional mechanisms are constantly moving the eyes to foveate on items of particular relevance to the current interests of the organism. Collicular Visuomotor Transformations for Gaze Control reviews the role of the superior colliculus in the control of gaze shifts (combined eye-head movements) and its possible involvement in the control of eye movements in 3D space (direction and depth). During attentive fixation, the “Vestibulo-Ocular Reflex” (VOR) and slow vergence maintain binocular foveal fixation to correct for body movements. When the task requires inspection of an eccentric stimulus, a complex synergy of coordinated movements comes into play. Such refixations typically involve a rapid combined eye-head movement (saccadic gaze shift) and often require binocular adjustment in depth (vergence). By virtue of its topographical organization, the superior colliculus has become a key area for experimental and modeling approaches to the question of how sensory signals can be transformed into goal-directed movements. Interestingly, the superior colliculus is not driven by visual input alone. Auditory and somatosensory cues are transformed to register with the visual map in the colliculus for the control of saccades. Sensor Fusion picks up this theme of ways in which sensory information can be brought together in the brains of diverse animals (snakes, cats, monkeys, and humans) and surveys biologically inspired technological implementations (such as the use of infrared to enhance vision). Pursuit Eye Movements takes us from saccadic “jumps” to those smooth eye movements involved in following a moving target. Current models of pursuit include “image motion” models, “target velocity” models, and models that address the role of prediction in pursuit. These models make no explicit reference to the neural structures that might be responsible, but the article analyzes the neural pathways for pursuit, stressing the importance of both visual areas of the cerebral cortex and oculomotor regions of the cerebellum.
The Retina, the outpost of the brain that contains both light-sensitive receptors and several layers of neurons that “preprocess” these responses, transforms visual signals in a multitude of ways to code properties of the visual world such as contrast, color, and motion. The article suggests that much of the retina’s signal coding and structural detail is derived from the need to optimally amplify the signal and eliminate noise. But retinal circuitry is diverse. The exact details are probably related to the ecological niche occupied by the organism. In mammals, the retinal output branches into two pathways, the collicular pathway and the geniculostriate pathway. The destination of the former is the midbrain region known as the superior colliculus, discussed above. Visual Cortex: Anatomical Structure and Models of Function reviews features of the microcircuitry of the target of the geniculostriate pathway, the primary visual cortex (area V1), and discusses the physiological properties of cells in its different laminae. It then outlines several hypotheses as to how the anatomical structure and connections might serve the functional organization of the region. For example, a connectionist model of layer IVc of V1 demonstrated that the gradient of change in properties of the layer could indeed be replicated using dendritic overlap through the lower two-thirds of the IVc layer. However, it was insufficient to explain the continuous and sharply increasing field size and contrast sensitivity observable near the top of the layer. The article shows how this discrepancy led to new experiments and related changes in the model which resulted in a good replication of the actual physiological data and required only feedforward excitation. The article goes on to analyze the anatomical substrates for orientation specificity and for surround modulation of visual responses, and concludes by discussing the origins of patterned anatomical connections. Ocular Dominance and Orientation Columns discusses further properties of cells in layer IVc of V1. When these cells are tested to see which eye drives them more strongly, it is found that ocular dominance takes the form of a zebra-stripe-like pattern of alternating dominance. Within this high-level organization are “hypercolumns” devoted to a particular retinotopic region of visual space, each hypercolumn being further refined into columns whose cells are best responsive to edges of the same specific orientation. The article also presents models for how these structures might form through self-organization during development. The article reviews data on the orientation specificity of cells of V1 and their columnar organization, and offers models for the way in which development may yield such features of cortical structure. Gabor Wavelets and Statistical Pattern Recognition shows how the response properties of many cells in primary visual cortex may be better described by what are called “Gabor wavelets” than as simple edge detectors. Each Gabor wavelet responds best to patterns of a given spatial frequency and orientation within a given neighborhood. The article relates this notion to both biology and technology. The detection of edge information from within a visual scene is an essential component of visual processing. This processing is believed to be initiated in the primary visual cortex, where individual neurons are known to act as feature detectors of the orientation of edges within the visual scene. Individual neurons can have an orientation preference (which states that neuron’s preferred orientation of the angle of edges) and orientation selectivity (which measures the neuron’s sensitivity as a detector of orientation). Orientation Selectivity focuses on mechanisms of orientation selectivity in the visual cortex, arguing that the orientation preference of each neuron and the orderly orientation preference map in cortex are likely to be consequences of a pattern of feedforward convergence. However, the selectivity observed in steady-state and orientation dynamics experiments cannot be achieved by a purely feedforward model. Corticocortical inhibition is a crucial ingredient in the emergence of orientation selectivity in the visual cortex, while the relative importance of corticocortical excitation in enhancing orientation selectivity is still under investigation but appears to be more significant for the function of complex cells than for simple cells in V1. Moving beyond the orientation features of primary visual cortex, Information Theory and Visual Plasticity demonstrates some aspects of information theory that are relevant to relaying information in cortex and connects entropy-based methods, projection pursuit, and extraction of simple cells in visual cortex. Feature Analysis offers a more general view of the characterization of visual features based on the redundancy of the visual signal and the transformation of the signal as it passes along the visual pathway. Describing a particular cell as an “x detector” implies that the cell responds when and only when that particular feature is present (e.g., an edge detector responds only in the presence of an edge), but the article argues that describing cells in the early visual system as “detectors” of any type of feature is misleading. Features are useful for describing natural images because the latter have massive informational redundancy. Image space itself is too vast to search directly. Feature analysis depends on the proposition that the search for particular objects can be concentrated in a subspace of image space, the feature space. Localized receptive fields in primary visual cortex provide the primitive basis set for the feature space of vision. These form the basis for the elaboration of neurons responding selectively to geometrical features in area TE of the inferotemporal cortex (IT), and these in turn from the basis for object recognition in different but overlapping areas of IT.
Given that cells in the early stages of the visual system, at least, provide a distributed (more or less retinotopic) set of “features” (in some suitably general sense, given the above caution, of patterns that yield the best response rather than patterns that yield the only response), the issue arises of how those features that correspond to a single object in the visual scene are bound together. Contour and Surface Perception introduces parallel interacting subsystems that follow complementary processing strategies. Boundary formation proceeds by spatially linking oriented contrast measures along smooth contour patterns, while perceptual surface attributes, such as lightness or texture, are derived from local ratio measures of image contrast of regions lying within contours. Mechanisms of both subsystems mutually interact to resolve initial ambiguities and to generate coherent representations of surface layout. Representations of intrinsic scene characteristics are constrained in terms of the consistency of the set of solutions, which often involve smoothness assumptions for correlated feature estimates. These consistency constraints are typically based on the laws of physical image generation. The article reviews fundamental approaches to computation of intrinsic scene characteristics and various neural models of boundary and surface computation. Each model involves lateral propagation of signals to interpolate and smooth sparse estimates.
Adaptive Resonance Theory (ART) bases learning on internal expectations. A pattern matching process (both for visual patterns and in other domains) compares an external input with the internal memory code for various patterns. ART matching leads either to a resonant state, which persists long enough to permit learning, or to a parallel memory search. If the search ends at an established code, the memory representation may either remain the same or incorporate new information from matched portions of the current input. When the external world fails to match an ART network’s expectations or predictions, a search process selects a new category, representing a new hypothesis about what is important in the present environment. Laminar Cortical Architecture in Visual Perception uses the LAMINART model (an extension of ART) to propose functional roles for cortical layers in visual perception. Neocortex has an intricate design that exhibits a characteristic organization into six distinct cortical layers, but few models have addressed the functional utility of the laminar organization itself in the control of behavior. LAMINART integrates data about visual perception and neuroscience for such processes as preattentive grouping and attention. It is suggested that the functional roles for cortical layers proposed here—binding together distributed cortical data through a combination of bottom-up adaptive filtering and horizontal associations, and modulating it with top-down attention—generalize, with appropriate specializations, to other forms of sensory and cognitive processing.
Cortical Population Dynamics and Psychophysics models cortical population dynamics to explain dynamical properties of the primate visual system on different levels, reaching from single neuron properties like selectivity for the orientation of a stimulus up to higher cognitive functions related to the binding and processing of stimulus features in psychophysical discrimination experiments. On the other hand, Synchronization, Binding and Expectancy argues that the “binding” of cells that correspond to a given visual object may exploit another dimension of cellular firing, namely, the phase at which a cell fires within some overall rhythm of firing. The article presents data consistent with the proposal that the synchronization of responses on a time scale of milliseconds provides an efficient mechanism for response selection and binding of population responses. Synchronization also increases the saliency of responses because it allows for effective spatial summation in the population of neurons receiving convergent input from synchronized input cells. Visual Scene Segmentation tackles the segmentation of a visual scene into a set of coherent patterns corresponding to objects. Objects appear in a natural scene as the grouping of similar sensory features and the segregation of dissimilar ones. Studies in visual perception, in particular Gestalt psychology, have uncovered a number of principles for perceptual organization, such as proximity, similarity, connectedness, and relatedness in memory. Scene segmentation requires neural networks to address the binding problem. The temporal correlation approach is to encode the binding by the correlation of temporal activities of feature-detecting cells. A special form of temporal correlation is oscillatory correlation, where the basic units are neural oscillators. The article first reviews nonoscillatory approaches in scene segmentation, and then turns to oscillatory approaches. The temporal correlation approach is further developed in Dynamic Link Architecture, which views the brain’s data structure as a graph composed of nodes connected by links, where both units and links bear activity variables changing on the rapid time scale of fractions of a second. The nodes play the role of symbolic elements. Dynamic links constitute the glue by which higher data structures are built up from more elementary ones.
Beyond the basic issue of how the visual scene is segmented (how visual elements are grouped) into possibly meaningful wholes lies the question of determining for a region so determined its color, motion, distance, shape, etc. These issues are addressed in the next set of articles. Color Perception stresses that color is not a local property inferred from the wavelength of light hitting a patch of retina but is a property of regions of space that depends both on the light they reflect and on the surrounding context. Our visual system “recreates” the world in the form of boundaries that contain surfaces, and color perception involves the perception of aspects of these surfaces. Matching surfaces with the same reflectance properties in different parts of the visual scene or under different illuminants are the two problems of color constancy. In addition, wavelength signals can be used in the course of perceiving form or motion independent of their role in the subjective experience of color. Directional Selectivity first reviews models of retinal direction selectivity (which contributes to oculomotor responses rather than motion perception). Older models depend on the way in which amacrine and other cells of the retina are connected to the ganglion cells, the retinal output cells. A newer model is based on the directionality of synaptic interactions on the dendrites of amacrine cells, involving a spatial asymmetry in the inputs and outputs of a dendrodendritic synapse, and its shunting inhibition. It is argued that development of this latter mechanism might involve Hebbian processes driven by spontaneous activity and light. Cortical directional selectivity (which does contribute to motion perception as well as the control of eye movements) involves many cortical regions. Directionally sensitive cells in primary visual cortex (V1) project to middle temporal cortex (MT) where directional selectivity becomes more complex, MT cells typically having larger receptive fields. From MT, the motion pathway projects to middle superior temporal cortex. Cortical directional selectivity has been modeled in three manners: as a spatially asymmetric excitatory drive followed by multiplication or squaring, via a spatially asymmetric nonlinear inhibitory drive, and through a spatially asymmetric linear inhibitory drive followed by positive feedback. This selectivity might involve Hebbian processes driven by spontaneous activity and binocular interactions. The issues in this article have some overlap with those presented in Motion Perception, Elementary Mechanisms, which emphasizes measurement of the direction and speed of movement of features in the 2D image linking successive views to infer optic flow, which is the pattern of image velocities that is projected onto the retina. The article discusses the cortical correlates of these various representations. Motion Perception: Navigation shows how, when an observer moves through the world, the optic flow can inform him about his own motion through space and about the 3D structure and motion of objects in the scene. This information is essential for tasks such as the visual guidance of locomotion through the environment and the manipulation and recognition of objects. This article focuses on the recovery of observer motion from optic flow. It includes strategies for detecting moving objects and avoiding collisions, discusses how optic flow may be used to control actions, and describes the neural mechanisms underlying heading perception. Global Visual Pattern Extraction continues the study of neural mechanisms which mediate between the extraction of local edge and contour information by orientation-selective simple cells in primary visual cortex (V1) and the high levels of cortical form vision in inferior temporal cortex (IT), where many neurons are sensitive to complex global patterns, including objects and faces. The ventral form vision pathway includes at least areas V1, V2, V4, TEO, and TE (the highest level of IT), raising the question of what processes occur at these intervening stages to transform local V1 orientation information into global pattern representations. Essentially the same question may be posed in cortical motion processing along the dorsal pathway comprising V1, V2, MT, MST, and higher parietal areas. V1 neurons extract only local motion vectors perpendicular to moving edge segments, while MST neurons are sensitive to complex optic flow patterns, including expansion. This article suggests answers to these analogous questions about transitions from local to global processing in both motion and form vision by focusing on intermediate levels of these two pathways, mainly V4 and MST.
Perception of Three-Dimensional Structure reviews various computational models for inferring an object’s 3D structure from different types of optical information, such as shading, texture, motion, and stereo, and examines how the performance of these models compares with the capabilities and limitations of human observers in judging different aspects of 3D structure under varying viewing conditions. In particular, stereoscopic vision exploits the fact that points in a 3D scene will in general project to different positions in the images formed in the left and right eyes. The differences in these positions are termed disparities. The stereo correspondence problem is to identify which points in a pair of stereo images correspond to a single point in 3D space. Solving this problem allows the stereo pair to be mapped into a single representation, called a disparity map, that makes explicit the disparities of various points common to both images, thus revealing the distance of various visual elements from the observer. Depth perception is then completed by determining depth values for all points in the images. Stereo Correspondence notes that various constraints have been used to help determine which features on the two eyes should be matched in inferring depth. These include compatibility of matching primitives, cohesivity, uniqueness, figural continuity, and the ordering constraint. Various neural network stereo correspondence algorithms are then reviewed, and the problems of surface discontinuities and uncorrelated points, and of transparency, are addressed. The article also reviews neurophysiological studies of disparity mechanisms.
A more abstract approach to the correspondence problem, from the perspective of computer vision rather than psychology or neurophysiology, is offered in Tensor Voting and Visual Segmentation. In 3D, as we have seen, surfaces are inferred from binocular images by obtaining depth hypotheses for points and/or edges. In image sequence analysis, the estimation of motion and shape starts with local measurements of feature correspondences, which gives noisy data for the subsequent computation of scene information. Hence, any salient structure estimator must be able to handle the presence of multiple structures and their interaction in the presence of noisy data. This article analyzes approaches to address early to midlevel vision problems, emphasizing the tensor voting methodology for the robust inference of multiple salient structures such as junctions, curves, regions, and surfaces from any combination of points, curve elements, and surface patch element inputs in 2D and 3D. The article describes two regularization formalisms, one that imposes certain physical constraints so that the search space can be constrained and algorithmically tractable, and another using a Bayesian formalism to transform an ill-posed problem into one of functional optimization.
Probabilistic Regularization Methods for Low-Level Vision offers regularization theory (cf. “Generalization and Regularization in Nonlinear Learning Systems”) as a general mathematical framework to deal with the fact that the problem of inferring 3D structure from 2D images is ill-posed: there are many spatial configurations compatible with a given 2D image or set (motion sequence, stereo pair, etc.) of images. The issue then becomes to find which spatial configuration is most probable. We have already seen a number of constraints associated with stereo vision. Deterministic regularization theory defines a “cost function,” which combines a measure of how close a spatial configuration comes to yielding the given image (set) with a measure of the extent to which the configuration violates the constraints, and then seeks that configuration which minimizes this cost. The present article emphasizes a more general probabilistic approach in which the “actual” field f and the observed field g are considered as realizations of random fields, with the reconstruction of f understood as an estimation problem. Markov Random Field Models in Image Processing views the task of image modeling as being one of finding an adequate representation of the intensity distribution of a given image. What is adequate often depends on the task at hand. The general properties of the local spatiotemporal structure of images or image sequences are characterized by a Markov random field (MRF) in which the probability distribution for the image intensity and a further set of other attributes (edges, texture, and region labels) at a particular location are conditioned on values in a neighborhood of pixels (picture elements or image points). The observed quantities are usually noisy, blurred images. The article presents five steps of MRF image modeling within a Bayesian estimation/inference paradigm, and provides a number of examples. Particular attention is paid to maximum a posteriori (MAP) estimates. MRF image models have proved versatile enough to be applied to image and texture synthesis, image restoration, flow field segmentation, and surface reconstruction.
Kalman Filtering: Neural Implications introduces Kalman filtering, which, under linear and Gaussian conditions, produces a recursive estimate of the hidden state of a dynamic system, i.e., one that is updated with each subsequent (noisy) measurement of the observed system. The article shows how Kalman filtering provides insight into visual recognition and the role of the cerebellum in motor control. In particular, it presents a hierarchically organized neural network for visual recognition, with each intermediate level of the hierarchy receiving two kinds of information: bottom-up information from the preceding level, and top-down information from the higher level. For its implementation, the model uses a multiscale estimation algorithm that may be viewed as a hierarchical form of the extended Kalman filter that is used to simultaneously learn the feedforward, feedback, and prediction parameters of the model on the basis of visual experiences in a dynamic environment. The resulting adaptive process involves a fast dynamic state-estimation process that allows the dynamic model to anticipate incoming stimuli, as well as a slow Hebbian learning process that provides for synaptic weight adjustments in the model.
Imaging the Visual Brain addresses functional brain imaging of visual processes, with emphasis on limits in spatial and temporal resolution, constraints on subject participation, and trade-offs in experimental design. The articles focuses on retinotopy, visual motion perception and visual object representation, and voluntary modulation of attention and visual imagery, emphasizing some of the areas where modeling and brain theory might be testable using current imaging tools. Visual Attention offers data and hypotheses for cortical mechanisms to overtly and covertly shift attention (i.e., with and without eye movements). Attention guides where to look next based on both bottom-up (image-based) and top-down (task-dependent) cues—and indeed, the anatomy of the visual system includes extensive feedback connections from later stages and horizontal connections within each layer. Vision appears to rely on sophisticated interactions between coarse, massively parallel, full-field preattentive analysis systems and the more detailed, circumscribed, and sequential attentional analysis system. The articles focus on the brain area involved in visual attention and then analyzes a variety of relevant mechanisms. Yet, having stressed the way in which we normally take a number of shifts of attention to fully take in the details of a visual scene, it is intriguing to learn how much can be absorbed in a single fixation. Fast Visual Processing notes that much information can be extracted from briefly glimpsed scenes, even at presentation rates of around 10 frames/s, a technique known as rapid sequential visual presentation (RSVP). Since interspike intervals for neurons are seldom shorter than 5 ms, the underlying algorithms should involve no more than about 20 sequential, though massively parallel, steps. There is an important distinction in neural computation between feedforward processing models and those with recurrent connections that allow feedback and iterative processing. Pure feedforward models (e.g., multilayer perceptrons, MLPs) can operate very quickly in parallel hardware. The article argues that even in systems that use extensive recurrent connections, the fastest behavioral responses may essentially depend on a single feedforward processing wave. It looks at how detailed measurements of processing speed can be combined with anatomical and physiological constraints to constrain models of how the brain performs such computations.
There is a vast literature on pattern recognition in neural networks (see, for example, “Pattern Recognition” and “Concept Learning”). Here we discuss articles on face recognition and object recognition. The recognition of other individuals, and in particular the recognition of faces, is a major prerequisite for human social interaction and indeed has been shown to employ specific brain mechanisms. The ability to recognize people from their faces is part of a spectrum of related skills that include face segmentation (i.e., finding faces in a scene or image) and estimation of the pose, direction of gaze, and the person’s emotional state. Face Recognition: Neurophysiology and Neural Technology starts with a review of relevant neurophysiology. Brain injury can lead to prosopagnosia, the loss of ability to recognize individual faces, while leaving intact the ability to recognize general objects. Single-unit recordings in the IT cortex of macaque monkeys have revealed neurons with a high responsiveness to the presence of a face, an individual, or the expression on the face, and neural models for face recognition are reviewed in relation to such data. The article then focuses on computational theories that are inspired by neural ideas (see Dynamic Link Architecture; Gabor Wavelets and Statistical Pattern Recognition) but that find their justification in the construction of successful computer systems for the recognition of human faces even when the gallery of possible faces is very large indeed. Face Recognition: Psychology and Connectionism provides a brief history of connectionist approaches to face recognition and surveys the broad range of tasks to which these models have been applied. The article relates the models to psychological theories for the subtasks of representing faces and retrieving them from memory, comparing human and model performance along these dimensions.
Object Recognition focuses on models of viewpoint-invariant object recognition that are constrained by psychological data on human object recognition. It present three main approaches to object recognition—invariant based, model based, and appearance based—and analyzes the strengths of each of these in a framework of decision complexity, noting the trade-off between representations that emphasize invariance and those designed for discriminability. The analysis shows that it is unlikely for a single form of representation to satisfy all kinds of object recognition tasks a human or other visual animal may encounter. The article thus argues that a key ingredient in a comprehensive brain theory for object recognition is a computational framework that allows on-demand selection or adaptation of representations based on the current task and proposes a simple “first past the post” scheme (a temporal winner-take-all scheme) for self-selecting the most appropriate level of abstraction, given a finite set of available representations along a visual processing pathway.
Object Structure, Visual Processing emphasizes structure-processing tasks that call for separate treatment of various fragments of the visual stimulus, each of which spans only a fraction of the visual extent of the object or scene under consideration. Examples of structural tasks include recognition of part-part similarities, and identifying a region in an object toward which an action can be directed. After discussing object form processing in computer vision and relevant neurophysiological data on primate vision, the article focuses on two neuromorphic models of visual structure processing. The JIM model implements a recognition-by-components scenario based on geons (“geometrical elements,” which are generalized cylinders formed by moving a cross-section along a possibly curved axis). The Chorus of Fragments model exploits both the “what” and the “where” streams of visual cortex to recognize fragments no matter what their position, but then uses their approximate spatial relationships to see whether they together form cues for the recognition of an object. In particular, then, it avoids the binding problem of explicitly linking neural activity related to a specific object as a prerequisite to analysis of that object’s characteristics. (By contrast, Synchronization, Binding and Expectancy argues that the brain does solve the binding problem, and does so by synchronization of neural firing for those neurons related to a single object.)
Object Recognition, Neurophysiology reviews some theoretical approaches to object recognition in the context of mainly neurophysiological evidence. It also considers briefly the analysis of visual scenes. Scene analysis is relevant to object recognition because scenes may themselves be recognized initially at a holistic, object-like level, providing a context or “gist” that influences the speed and accuracy of recognition of the constituent objects. The article proposes that object recognition is based on a distributed, view-based representation in which objects are recognized on the basis of multiple, 2D-feature-selective neurons. Specialist cells appear to play a role in associating such feature combinations into certain nontrivial image transformations, coding for a certain percentage of all stimuli in a largely view-invariant manner. The article offers evidence that a convergent hierarchy is used to build invariant representations over several stages, and that at each stage lateral competitive processes are at work between the neurons. It is argued that the association of views of objects observed over the course of time could play a key role in building up object representations. The review focuses mainly on the “what” stream of IT cortex, seen as the center of object recognition. Visual Scene Perception, Neurophysiology also brings in the “where/how” stream of parietal cortex as it analyzes how mechanisms that integrate schemas for recognition of different objects into the perception of some overall scene may be linked to the distributed planning of action. It also presents recent neurophysiology suggesting how the context of a natural scene may modify the response properties of neurons responsive to visual features. The article compares three approaches—the slide-box metaphor, short-term memory in the VISIONS system, and the visuospatial scratchpad—for creating a theory of how the visual perception of objects may be integrated with the perception of spatial layout. The first two stress a schema-theoretic approach, while the latter is strongly tied to visual neurophysiology and modeling in terms of quasi-neural attractor networks. The aim is to open the way to future research that will embed the study of visual scene perception in an action-oriented integration of IT and parietal visual systems.
Auditory Cortex
Auditory Periphery and Cochlear Nucleus
Auditory Scene Analysis
Echolocation: Cochleotopic and Computational Maps
Electrolocation
Olfactory Bulb
Olfactory Cortex
Pain Networks
Prosthetics, Sensory Systems
Sensor Fusion
Somatosensory System
Somatotopy: Plasticity of Sensory Maps
Sound Localization and Binaural Processing
Here we analyze sensory systems other than vision—e.g., touch, audition, and pain. Moreover, when one sense cannot provide all the necessary information, complementary observations may be provided by another sense. For example, touch complements vision in placing a peg in a hole when the effector occludes the agent’s view. Also, senses may offer competing observations, such as the competition between vision and the vestibular system in maintaining balance (and its occasional side effect of seasickness). Another type of interplay between the senses is the use of information extracted by one sense to focus the attention of another sense, coordinating the two, as in audition cueing vision. Sensor Fusion explores a number of ways sensory information is brought together in the brains of diverse animals (snakes, cats, monkeys, humans) and surveys biologically inspired technological implementations (such as the use of infrared to enhance vision). (See also “Collicular Visuomotor Transformations for Gaze Control” for an important example of sensor fusion—the transformation of auditory and somatosensory cues into a visual map for the control of rapid eye movements.)
The road map Mammalian Brain Regions introduced a number of regions linked to sensory systems other than vision, but we will now meet a number of related and additional topics as well. Somatosensory System shows how the somatosensory system changes the tactile stimulus representation from a form more or less isomorphic to the stimulus to a completely distributed form in a series of partial transformations in successive subcortical and cortical networks. It further argues that the causal factors involved in body/object interactions are explicitly represented by an internal model in the pyramidal cells of somatosensory cortex that is crucial for haptic perception of proximal surroundings and for control of object manipulation. Somatotopy, a dominant feature of subdivisions of the somatosensory system, is defined by a topographic representation, or map, in the brain of sensory receptors on the body surface. Somatotopy: Plasticity of Sensory Maps shows that these orderly representations of cutaneous receptors in the spinal cord, lower brainstem, thalamus, and neocortex represent both the peripheral distribution of receptors and dynamic aspects of brain function. The article reviews evidence for somatosensory plasticity involving cortical reorganization after peripheral injury and as a result of training. The article analyzes the features of somatotopic maps that change, the contribution of subcortical changes to cortical plasticity, the mechanisms involved, and the functional consequences of sensory map changes. An important issue is the relation between the plasticity of the sensory and motor systems.
Pain Networks adds a new dimension to bodily sensation. The pain system encodes information on the intensity, location, and dynamics of tissue-threatening stimuli but differs from other sensory systems in its “emotional-motivational” factors (see also “Motivation”). In the pain system, these factors strongly modulate the relation between stimulus and felt response. At one extreme is allodynia, a state in which the slightest touch with a cotton wisp is agonizing. People display wide individual and trial-to-trial variability in the amount of pain reported following administration of calibrated noxious stimuli; pain sensation is subject to ongoing modulation by a complex of extrinsic (stimulus-generated) and intrinsic (CNS-generated) state variables. The article spells out how these act in the CNS as well as the periphery.
Auditory Periphery and Cochlear Nucleus spells out how the auditory periphery parcels out acoustic stimulus across hundreds of nerve fibers, and how the cochlear nucleus continues this process by creating multiple representations of the original acoustic stimulus. The article emphasizes monaural signal processing, whereas Sound Localization and Binaural Processing shows how information from the two ears is brought together. The article focuses on the use of interaural time difference (ITD) as one way to estimate the azimuthal angle of a sound source. It describes one biological model (ITD detection in the barn owl’s brainstem) and two psychological models. The underlying idea is that the brain attempts to match the sounds in the two ears by shifting one sound relative to the other, with the shift that produces the best match assumed to be the one that just balances the “real” ITD. Auditory Cortex stresses the crucial role that auditory cortex plays in the perception and localization of complex sounds, examining auditory tasks vital for all mammals, such as sound localization, timbre recognition, and pitch perception. Auditory Scene Analysis discusses how the auditory system parses the acoustic mixture that reaches the ears of an animal to segregate a targeted sound source from the background of other sounds. The first stage, segmentation, decomposes the acoustic mixture into its constituent components. In the second stage, acoustic components that are likely to have arisen from the same environmental event are grouped, forming a perceptual representation (stream) that describes a single sound source. At the physiological level, segmentation corresponds (at least in part) to peripheral auditory processing, which performs a frequency analysis of the acoustic input, whereas the physiological substrate of auditory grouping is much less well understood. The article focuses on models that are at least physiologically plausible, while noting that other models of auditory scene analysis adopt a more abstract information processing perspective.
Echolocation: Cochleotopic and Computational Maps provides us with a more detailed understanding of the auditory system in a very special class of mammals, the bats. Mustached bats emit echolocation (ultrasonic) pulses for navigation and for hunting flying insects. On the basis of the echo, prey must be detected and distinguished from the background clutter of vegetation, characterized as appropriate for consumption, and localized in space for orientation and prey capture. The bats emit ultrasonic pulses that consist of a long constant-frequency component followed by a short frequency-modulated component. Each pulse-echo combination provides a discrete sample of the continuously changing auditory scene. The auditory network contains two key design features: neurons that are sensitive to combinations of pulse and echo components, and computational maps that represent systematic changes in echo parameters to extract the relevant information.
Electrolocation is another sense that helps the animal locate itself in its world, but this time the animals are electric fishes rather than bats, and the signals are electrical rather than auditory. Electrolocation relates its topic to the general issue of mechanisms that facilitate the processing of relevant signals while rejecting noise, and of attentional processes that select which stimuli are to be attended to. Weakly electric fish generate an electrical field around their body and measure this field via electroreceptors embedded in the skin to “electrolocate” animate or inanimate targets in the environment. The article emphasizes a widespread but poorly understood characteristic of sensory processing circuits, namely, the presence of massive descending or feedback connections by which higher centers presumably modulate the operation of lower centers. Not only are response gain and receptive field organization controlled by these descending connections, but there are adaptive filtering mechanisms that can reject stimuli that otherwise might mask critical functions. This use of stored sensory expectations for the cancellation or perhaps the identification of specific input patterns may yield insights into diverse neural circuits, including the cochlear nuclei and the cerebellum, in other species.
Two articles introduce data and models for the olfactory system (see also the road map Mammalian Brain Regions). Olfactory Bulb describes the special circuitry involved in basic preprocessing, while Olfactory Cortex presents a dynamical systems analysis of further olfactory processing. The olfactory bulb receives input from the sensory neurons in the olfactory epithelium and sends its outputs to the olfactory cortex, among other brain regions. The bulb was one of the first regions of the brain for which compartmental models of neurons were constructed, which led to some of the first computational models of functional microcircuits. Olfactory Bulb gives an overview of olfactory bulb cells and circuits, current ideas about the computational functions of the bulb, and modeling studies to investigate these functions. The olfactory cortex is defined as the region of the cerebral cortex that receives direct connections from the olfactory bulb. It is the earliest cortical region to differentiate in the evolution of the vertebrate forebrain and the only region within the forebrain to receive direct sensory input. Moreover, the olfactory cortex has the simplest organization among the main types of cerebral cortex. Olfactory Cortex thus views it as a model for understanding basic principles underlying cortical organization.
Finally, a very different view of sensory systems is provided by Prosthetics, Sensory Systems, which discusses how information collected by electronic sensors may be delivered directly to the nervous system by electrical stimulation. After assessing the amenability of all sensory modalities (hearing, vision, touch, proprioception, balance, smell, and taste), the article focuses on auditory and visual prostheses. The great success story has been with cochlear implants. Here the article reviews improved temporospatial representations of speech sounds, combined electrical and acoustic stimulation in patients with residual hearing, and psychophysical correlates of performance variability. Since a prosthesis does not necessarily match natural neural encoding of a stimulus, the success of the prosthesis depends in part on the plasticity of the human brain as it remaps to accommodate this new class of signals. For example, the success of cochlear implants rests in part on the ability of auditory cortex to remap itself in a similar fashion to the remapping of somatosensory cortex described in Somatotopy: Plasticity of Sensory Maps.
Arm and Hand Movement Control
Biologically Inspired Robotics
Identification and Control
Motor Control, Biological and Theoretical
Potential Fields and Neural Networks
Q-Learning for Robots
Reactive Robotic Systems
Reinforcement Learning in Motor Control
Robot Arm Control
Robot Learning
Robot Navigation
Sensorimotor Learning
As noted in the “Historical Fragment” section of Part I, the interchange between biology and technology that characterizes the study of neural networks is an outgrowth of work in cybernetics in the 1940s. One of the keys to cybernetics was control (the other was communication of the kind studied in information theory). It is thus appropriate that control theory should have become a major application area for neural networks as well as being a key concept of brain theory. The objective of control is to influence the behavior of a dynamical system in some desired fashion. The latter includes maintaining the outputs of systems at constant values (regulation) or forcing them to follow prescribed time functions (tracking). Maintaining the altitude of an aircraft or the glucose level in the blood at constant values are examples of regulation; controlling a rocket to follow a given trajectory is an example of tracking. Motor Control, Biological and Theoretical sets forth the basic cybernetic concepts. A motor control system acts by sending motor commands to a controlled object, often referred to as “the plant,” which in turn acts on the local environment. The plant or the environment has one or more variables which the controller attempts to regulate. If the controller bases its actions on signals which are not affected by the plant output, it is said to be a feedforward controller. If the controller bases its actions on a comparison between desired behavior and the controlled variables, it is a feedback controller. “Motor Pattern Generation” provides a related perspective (see the road map Motor Pattern Generators).
The major advantage of negative feedback control, in which the controller seeks constantly to cancel the feedback error, is that it is a very simple, robust strategy that operates well without exact knowledge of the controlled object, and despite internal or external disturbances. The advantage of feedforward control is that it can, in the ideal case, give perfect performance with no error between the reference and the controlled variable. The main disadvantages are the practical difficulties in developing an accurate controller, and the lack of corrections for unexpected disturbances. Identification and Control explores the major strategy for developing an accurate controller, namely to “identify” the plant as belonging to (or more precisely, being well approximated by) a system obtained from a general family of systems by setting a key set of parameters (e.g., the coefficients in the matrices of a linear system). By coupling a controller to an identification procedure, one obtains an adaptive controller that can handle an unknown plant even if its dynamics are (slowly) changing. In both biology and many technological applications, nonlinearities and uncertainties play a major role, and linear approximations are not satisfactory. The article presents research using neural networks to handle these nonlinearities and examines the theoretical assumptions that have to be made when such networks are used as identifiers and controllers.
Reinforcement Learning in Motor Control recalls the general theory introduced in “Reinforcement Learning” and proceeds to note its utility in motor control. Many motor skills are attained in the absence of explicit feedback about muscle contractions or joint angles. In contrast to supervised learning, such learning depends on “reinforcement” (or evaluative feedback; it need not involve pleasure or pain), which tells the learner whether or not, and possibly by how much, its behavior has improved, or provides an indication of success or failure. Instead of trying to match a standard of correctness, a reinforcement learning system tries to maximize the goodness of behavior as indicated by evaluative feedback. To do this, it has to actively try alternatives, compare the resulting evaluations, and use some kind of selection mechanism to guide behavior toward the better alternatives. Q-Learning for Robots applies reinforcement learning techniques to robot control. Q-learning does not require a model of the robot-world interaction, and it uses learning examples in the form of triplets (situation, action, Q-value), where the Q-value is the utility of executing the action in the situation. Q-learning involves three different functions, evaluation, memorization, and updating. Heuristically adapted Q-learning has proved successful in applications such as obstacle avoidance, wall following, go-to-the-nest, etc., using neural-based implementations such as multilayer perceptrons trained with backpropagation, or self-organizing maps.
Sensorimotor Learning explains how neural nets can acquire “models” of some desired sensorimotor transformation. A forward model is a representation of the transformation from motor commands to movements, in other words, a model of the controlled object. An inverse model is a representation of the transformation from desired movements to motor commands, and so can be used as the controller for the controlled object. The managing of multiple models, each with their own range of applicability in given tasks, is given special attention. Robot Learning focuses on learning robot control, the process of acquiring a sensorimotor control strategy for a particular movement task and movement system. The article offers a formal framework within which to discuss robot learning in terms of the different methods that have been suggested for the learning of control policies, such as learning the control policy directly, learning the control policy in a modular way, indirect learning of control policies, imitation learning, and learning of motor control components. The article also reviews specific function approximation problems in robot learning, including neural network approaches. Robot Arm Control addresses related issues concerning the availability of precise mappings from physical space or sensor space to joint space or motor space. Robot arm controllers are usually hierarchically structured from the lowest level of servomotors to the highest levels of trajectory generation and task supervision. In each case an actual motion is made to follow as closely as possible a commanded motion through the use of feedback. The difference lies in the coordinate systems used at each level. At least four coordinate spaces can be distinguished: the task space (used to specify tasks, possibly in terms of sensor readings), the workspace (6D Cartesian coordinates defining a position and orientation of the end-effector), the joint space (intrinsic coordinates determining a robot configuration), and the actuator space (in which actual motions are commanded). Correlational procedures carry out feature discovery or clustering and are often used to represent a given state space in a compact and topology-preserving manner, using procedures such as those described in “Self-Organizing Feature Maps.” Error-minimization procedures require explicit data on input-output pairs; their goal is to build a mapping from inputs to outputs that generalizes adequately using, e.g., the least-mean-squares (LMS) rule and backpropagation. In between both extremes lie procedures that use reinforcement learning to build a mapping that maximizes reward. Arm and Hand Movement Control discusses some of the most prominent regularities of arm and hand control, and examines computational and neural network models designed to explain them. The analysis reveals an interesting competition between explanations sought on the neural, biomechanical, perceptual, and computational levels that has created its share of controversy. Whereas some topics, such as internal model control, have gained solid grounding, the importance of the dynamic properties of the musculoskeletal system in facilitating motor control, the role of real-time perceptual modulation of motor control, and the balance between dynamical systems models versus optimal control-based models are still seen as offering many open questions.
Biologically Inspired Robotics describes how modern robotics may learn from the way organisms are constructed biologically and how this creates adaptive behaviors. (I cannot resist noting here the acronym introduced by R. I. Damper, R. L. B. French, and T. W. Scutt, 2000, ARBIB: An Autonomous Robot Based on Inspiration from Biology, Robotics and Autonomous Systems, 31:247–274.) Research on autonomous robots based on inspiration from biology ranges from modeling animal sensors in hardware to guiding robots in target environments to investigating the interaction between neural learning and evolution in a variety of robot tasks. After reviewing the historical roots of the subject, the article provides a general introduction to biologically inspired robotics, with special emphasis on the ideas that the robot is situated in the world and that many complex behaviors are emergent properties of the collective effects of linking a variety of simple behaviors. Reactive Robotic Systems provides a conceptual framework for robotics that is rooted in “Schema Theory” (q.v.) rather than symbolic AI. Here, robot behavior is controlled by the activation of a collection of low-level primitive behaviors (schemas), and complex behavior emerges through the interaction of these schemas and the complexities of the environment in which the robot finds itself. This work was inspired in part by studies of animal behavior (see, e.g., “Neuroethology, Computational” and related articles discussed in the road maps on Motor Pattern Generators and Neuroethology and Evolution). However, the article not only shows the power of reactive robots in many applications, it also notes the utility of hybrid systems capable of using deliberative reasoning as well as reactive execution (which fits with an evolutionary view of the human brain in which reactive systems handle many functions but can be overruled or orchestrated by, e.g., the deliberative activities of prefrontal cortex).
Robot Navigation examines how to get a mobile robot to move to its destination efficiently (e.g., along short trajectories) and safely (i.e., without colliding). If a target location is either visible or identified by a landmark (or sequence of landmarks), a simple stimulus-response strategy can be adopted. However, if targets are not visible, the robot needs a model (or map) of the environment encoding the spatial relationships between its present and desired locations. Sensor uncertainty, together with the inaccuracy of the robot’s actuators and the unpredictability of real environments, makes the design of mobile robot controllers a difficult task. It has thus proved desirable to endow robots with learning capabilities in order to acquire autonomously their control system and to adapt their behavior to never experienced situations. The article thus reviews neural approaches to localization, map building, and navigation. More specifically, Potential Fields and Neural Networks examines biological findings on the use of potential fields (which represent, e.g., the force field that drives the motor output of an animal or part of an animal, such as a limb) to characterize the control and learning of motor primitives. The notion of potential fields has also been used to model externally induced constraints as well as internally constructed sensorimotor maps for robot motion control. A robot can reach a stable configuration in its environment by following the negative gradient of its potential field. In this case, the configurations reached will be locally stable but may not be optimal with respect to some behavioral criterion. This deficit can be overcome either by incorporating a global motion planner or by using a harmonic function that does not contain any local minima. The article further indicates how potential field–based motion control can benefit from the use of ANN-based learning. There are links here to the more biological concerns of the articles “Cognitive Maps,” “Hippocampus: Spatial Models,” and “Motor Primitives.”
Chains of Oscillators in Motor and Sensory Systems
Command Neurons and Command Systems
Crustacean Stomatogastric System
Gait Transitions
Half-Center Oscillators Underlying Rhythmic Movements
Locomotion, Invertebrate
Locomotion, Vertebrate
Locust Flight: Components and Mechanisms in the Motor
Motor Pattern Generation
Motor Primitives
Respiratory Rhythm Generation
Scratch Reflex
Sensorimotor Interactions and Central Pattern Generators
Spinal Cord of Lamprey: Generation of Locomotor Patterns
Motor Pattern Generation provides an overview of the basic building blocks of behavior (see “Motor Control, Biological and Theoretical” for more general background) to be expanded upon in many of the following articles. The emphasis is on rhythmic behaviors (such as flight or locomotion), but a variety of “one-off” motor patterns (as typified in a frog snapping at its prey) are also studied. The crucial notion is that a central pattern generator (CPG), an autonomous neural circuit, can yield a good “sketch” of a movement, but that the full motor pattern generator (MPG) augments the CPG with sensory input which can adjust the motor pattern to changing circumstances (e.g., the pattern of locomotion varies when going uphill rather than on level terrain, or when the animal carries a heavy load). Sensorimotor Interactions and Central Pattern Generators discusses both the impact of sensory information on CPGs and the influence of motor systems on sensory activity. It stresses that interaction between motor and sensory systems is pervasive, from the first steps of sensory detection to the highest levels of processing, emphasizing that descending motor commands are only acted upon by spinal circuits when these circuits integrate their intrinsic activity with all incoming information.
Command Neurons and Command Systems analyzes the extent to which an MPG may be activated alone or in concert with others through perceptual stimuli mediated by a single “command neuron” or by more diffuse “command systems.” Command functions provide the sensorimotor interface between sensory pattern recognition and localization, on the one side, and motor pattern generation on the other. For example, if a certain interneuron is stimulated electrically in the brain of a marine slug, the animal then displays a species-specific escape swimming behavior, although no predator is present. If in a toad a certain brain area of the optic tectum is stimulated in this manner, snapping behavior is triggered, although no prey is present. In both cases, a stimulus produces a rapid ballistic response.
Motor Primitives and Scratch Reflex look at two behaviors (the former studied in frogs, the latter primarily in turtles) elicited by an irritant applied to the animal’s skin. In each case, the position at which the limb is aimed varies with the position of the irritant; there is somatotopic (i.e., based on place on the body) control of the reflex. In both frog and turtle, and thus more generally, spinal cord neural networks can by themselves generate complex sensorimotor transformations even when disconnected from supraspinal structures. Moreover, each reflex has different “modes.” To understand this, just think of scratching your lower back. As the scratch site moves higher, the positioning of the limb changes continuously with the position of the irritant until the irritant moves up so much that you make a discontinuous switch to the “over-the-shoulder” mode of back-scratching. The mode changes in these two articles may be compared to the Gait Transitions (q.v.), discussed below. In any case, we see here two important issues: how is an appropriate pattern of action chosen, and how is the chosen pattern parameterized on the basis of sensory input? Motor Primitives advances the idea that CPGs construct spinal motor acts by recruiting a few motor primitives from a set encoded in the spinal cord. The best evidence comes from examination of wiping movements and microstimulation of frog spinal cord, where movements are constructed as a sequencing and combination of a collection of force-field motor primitives or fundamental elements. “Visuomotor Coordination in Frog and Toad” discusses how the frog’s motor acts may be assembled on the basis of visual input.
With this, we switch to articles in which the emphasis is on rhythmic behavior, with rather little concern for the spatial structure of the movement (for example, the discussion of locomotion will focus on coordinating the rhythms of the legs when the animal progresses straight ahead, rather than on how these rhythms are modified when the animal traverses uneven terrain or turns to avoid an obstacle). Crustacean Stomatogastric System analyzes specific circuits of identified neurons controlling the chewing (by teeth inside the stomach) behavior of crustaceans. Of particular interest is the finding that neuropeptides (see “Neuromodulation in Invertebrate Nervous Systems”) can change the properties of cells and the strengths of connections so that, e.g., a cell can become a pacemaker or a previously ineffective connection can come to exert a strong influence, and with this a network can dramatically change its overall behavior. Thus, the change of “mode” may be under the control of an explicit chemical “switch” of underlying cellular properties. Of course, in some systems, different input patterns of excitation and inhibition may enable a given circuit to act in one of several modes; while in other cases the change of mode may involve the transfer of control from one neural circuit to another. Locomotion, Invertebrate focuses on invertebrate locomotion systems for which quantitative modeling has been done, reviewing computer models of swimming, flying, crawling, and walking, paying special attention to the interaction of neural networks with the biomechanical systems they control. The article also reviews the use of biologically inspired locomotion controllers in robotics, stressing their distributed nature, their robustness, and their computational efficiency. Conversely, robots can serve as an important new modeling methodology for testing biological hypotheses. Locust Flight: Components and Mechanisms in the Motor narrows the focus to one specific invertebrate motor system. The article emphasizes the interactions of the intrinsic properties of flight neurons, the operation of complex circuits, and phase-specific proprioceptive input, all subject to the concentrations of circulating neuromodulators. Locust flight can adapt to the demands of a constantly changing sensory environment, and the flight system is flexible and able to operate despite severe ablations and then to recover from these lesions.
Half-Center Oscillators Underlying Rhythmic Movements looks at a set of minimal circuits for generating rhythmic behavior, starting with the half-center oscillator model first proposed to account for the observation that spinal cats (i.e., cats in which connections between brain and spinal cord had been severed) could produce stepping movements even when all sensory feedback from the animal’s motion was eliminated. The article shows the utility of models of this type in analyzing rhythms in invertebrates as well as vertebrates—the pelagic mollusk Clione, tadpoles, and lampreys—in terms of the intrinsic membrane properties of the component neurons interacting with reciprocal inhibition to initiate and sustain oscillation in these networks. Spinal Cord of Lamprey: Generation of Locomotor Patterns marks an important transition: from seeing how one network can oscillate to seeing how the oscillation of a series of networks can be coordinated. Experiments show that neural circuitry in isolated pieces of the spinal cord of lamprey (a jawless, primitive type of fish) can exhibit oscillations, and when these pieces constitute an intact spinal cord, they all oscillate with the same frequency but form a “traveling wave” with a phase relationship that in the complete fish would yield a wave of bending progressing down the fish from head to tail to yield the coordinated “wiggling” that yields swimming. The article reviews the interaction between experimentation and modeling stimulated by such findings. Respiratory Rhythm Generation presents several alternative models of breathing and evaluates them against mammalian data. These data point to the importance both of endogenous bursting neurons and of network interactions in generating the basic rhythm. In most models, rhythmogenesis is either pacemaker or network driven. The article reviews the data and these models, and then points the way to future models that clarify the integration of endogenous bursting with network interactions. Locomotion, Vertebrate shows how neural networks in the spinal cord generate the basic rhythmic patterns necessary for vertebrate locomotion, while higher control centers interact with the spinal circuits for posture control and accurate limb movements, and by sending higher-level commands such as stop and go signals, speed, and heading of motion. In mammals, evolution of the CPGs has been accompanied by important modifications of the descending pathways under the requirements of complex posture control and accurate limb movements, although the extent of the respective changes remains unknown. Computer models that combine neural models with biomechanical models are seen as having an important role to play in studying these issues. One example uses “genetic algorithms” to model the transition from a lamprey-like spinal cord that supports traveling waves to a salamander-like spinal cord that supports both traveling waves for swimming and “standing waves” for terrestrial locomotion, and shows how vision may modulate spinal activity to yield locomotion toward a goal (see also “Visuomotor Coordination in Salamander”).
Chains of Oscillators in Motor and Sensory Systems abstracts from the specific circuitry to show how oscillators and their coupling can be characterized in a way that allows the proof of mathematical theorems about patterns of coordination. CPGs are discussed not only for the spinal cord of lamprey, but also for the crayfish swimmeret system and the leech network of swimming. In the context of locomotion, each oscillator is likely to be a local subnetwork of neurons that produces rhythmic patterns of membrane potentials. Since the details of the oscillators often are not known and are difficult to obtain, the object of the mathematics is to find the consequences of what is known, and to generate sharper questions to motivate further experimentation. Gait Transitions also studies its topic (e.g., the transition from walking to running) from the abstract perspective of dynamical systems.
Action Monitoring and Forward Control of Movements
Arm and Hand Movement Control
Basal Ganglia
Cerebellum and Motor Control
Collicular Visuomotor Transformations for Gaze Control
Equilibrium Point Hypothesis
Eye-Hand Coordination in Reaching Movements
Geometrical Principles in Motor Control
Grasping Movements: Visuomotor Transformations
Hippocampus: Spatial Models
Imaging the Motor Brain
Limb Geometry, Neural Control
Motor Control, Biological and Theoretical
Motor Cortex: Coding and Decoding of Directional Operations
Motoneuron Recruitment
Muscle Models
Optimization Principles in Motor Control
Prosthetics, Motor Control
Pursuit Eye Movements
Reaching Movements: Implications for Computational Models
Reinforcement Learning in Motor Control
Rodent Head Direction System
Sensorimotor Learning
Vestibulo-Ocular Reflex
Muscle transduces chemical energy into force and motion, thereby providing power to move the skeleton. Because of the intricacies of muscle microstructure and architecture, no comprehensive models are yet able to predict muscle performance completely. Muscle Models reviews three classes of models each fulfilling a more narrowly defined objective, ranging from attempts to understand the molecular level (cross-bridge models) through lumped parameter mechanical models to input-output models of whole muscle behavior that can be used as part of a broader study of basic musculoskeletal biomechanics or issues of neural control. A motor neuron together with the muscle fibers that it innervates constitutes a motor unit, and each muscle is a composite structure whose force-generating components, the motor units, are typically heterogeneous. Such aggregates can produce much larger forces than a single motor unit. Motoneuron Recruitment shows how the motor units can be recruited in the service of reflexes, voluntary movement, and posture. The article considers mechanisms that compensate for muscle fatigue and yielding, models the possible role of Renshaw cells in linearization or equalization of motor neuron pool responses, and considers the possible role of cerebellum in control of motor neuron gain, as well as the roles of motor cortex in motor neuron recruitment.
Prosthetics, Motor Control deals with the use of electrical stimulation to alter the function of motor systems, either directly or indirectly. The article presents three clinical applications. Therapeutic electrical stimulation is electrically produced exercise in which the beneficial effect occurs primarily off-line as a result of trophic effects on muscles and perhaps the CNS. Neuromodulatory stimulation is preprogrammed stimulation that directly triggers or modulates a function without ongoing control or feedback from the patient, and functional electrical stimulation (FES) provides precisely controlled muscle contractions that produce specific movements required by the patient to perform a task. The article also describes subsystems for muscle stimulation, sensory feedback, sensorimotor regulation, control systems, and command signals, most of which are under development to improve on-line control of FES.
Motor Control, Biological and Theoretical sets forth the basic cybernetic concepts. A motor control system acts by sending motor commands to a controlled object, often referred to as “the plant,” which in turn acts on the local environment. The plant or the environment has one or more variables that the controller attempts to regulate. If the controller bases its actions on signals that are not affected by the plant output, it is said to be a feedforward controller. The full understanding of movement must rest on a full analysis of the integration of neural networks with the biomechanics of the skeletomuscular system. Nonetheless, much has been learned about limb control from a more abstract viewpoint, as the next four articles show. Optimization theory has become an important aid to discovering organizing principles that guide the generation of goal-directed motor behavior, specifying the results of the underlying neural computations without requiring specific details of the way those computations are carried out. Optimization Principles in Motor Control concedes that not all motor behaviors are necessarily optimal but argues that attempts to identify optimization principles can yield a useful taxonomy of motor behavior. The hypothesis is that in performing a motor task, the brain produces coordinated actions that minimize some measure of performance (such as effort, smoothness, etc.). The article reviews several studies in which such ideas were examined in the context of planar upper limb movements, comparing the purely kinematic minimum jerk model with the more dynamics-based minimum torque change model. Bur how does one go from a kinematic description of the movement of the hand to the pattern of muscle control that yields it? There are still many competing hypotheses. One approach seeks to find control systems that yield optimal trajectories in the absence of disturbances. Another starts from the observation that a muscle is like a controlled-length spring: set its length, and it will naturally return to the equilibrium length that was set. The Equilibrium Point Hypothesis builds on this a systems-level description of how the nervous system controls the muscles so that a stable posture is maintained or a movement is produced. In this framework, the controller is composed of muscles and the spinal-based reflexes, and the plant is the skeletal system. The controller defines a force field that is meant to capture the mechanical behavior of the muscles and the effect of spinal reflexes. The equilibrium point hypothesis views motion as a gradual postural transition, and it is suggested that for the case of multijoint arm movements, one can predict the hand’s motion if the supraspinal system smoothly shifts the equilibrium point from the start point to a target location. Geometrical Principles in Motor Control considers a different transition, that from the spatial representation of a motor goal to a set of appropriate neuromuscular commands, which is in many respects similar to a coordinate transformation. (A word of caution: The matter is subtle because the brain rarely has neurons whose firing encodes a single coordinate. Consider, for example, retinotopic coding as distinct from the specific use of (x, y) or (r, θ) coordinates. Thus the issue is whether the activity in certain networks is better described as encoding one representation than another, such as those related to the eye rather than those related to the shoulder.) The article describes three types of coordinate system—end-point coordinates, generalized coordinates, and actuator coordinates—each representing a particular “point of view” on motor behavior, then examines the geometrical rules that govern the transformations between these classes of coordinates. It shows how a proper representation of dynamics may greatly simplify the transformation of motor planning into action. Limb Geometry, Neural Control offers another perspective, starting from a discussion of the role of extrinsic and intrinsic coordinates when a human makes a movement. Multijointed coordination complicates the problem of motor control. Consider the case of arm movements. The activation of an elbow flexor will always contribute a flexor torque at the elbow, but the resulting elbow movement can be flexion, extension, or no motion at all, depending on the actively produced torque at the shoulder. Although in principle a coordinated motor action could be planned muscle by muscle, a more parsimonious solution is to plan more global goals at higher levels of organization and let the lower-level controllers specify the implementation details. The article reviews issues related to the kinematic aspects of limb geometry control for arm movements and for posture and gait.
Fast, coordinated movements depend on the nervous system being able to use copies of motor control signals (the corollary discharge) to compute expectations of how the body will move, rather than always waiting for sensory feedback to signal the current state of the body. Action Monitoring and Forward Control of Movements spells out three functions of corollary discharge. The stability of visual perception during eye movements was one of the first physiological applications proposed for an internal comparison between a movement and its sensory outcome. Second, goal-directed behavior implies that the action should continue until the goal has been satisfied, so that motor representations must involve not only forward mechanisms for steering the action but also mechanisms for monitoring its course and checking its completion. Third, similar processes have been postulated for actions aimed at complex and relatively long-term goals, for comparing the representation of the intended action to the actual action and compensating for possible mismatch between the two. Clearly, the effective use of corollary discharge rests on the brain having learned the relation between current state, motor command, and the movement that ensues. Sensorimotor Learning explains how neural nets can acquire forward and inverse “models” of some desired sensorimotor transformation. The managing of multiple models, each with its own range of applicability in given tasks, is given special attention. The relevance of such models to the role of cerebellum (Cerebellum and Motor Control) is briefly noted, as is the idea that these models may act by controlling lower-level “Motor Primitives” (q.v.). Reinforcement Learning in Motor Control, which presents general learning strategies based on adaptive neural networks, is treated further in the road map Robotics and Control Theory.
With this background, we turn to articles primarily concerned with visually controlled behaviors for which neurophysiological data are available from the mammalian (and in many cases the monkey) brain, as well as behavioral and, in some cases, imaging data for humans. The road map takes us from basic unconscious behaviors to those involving skilled action. The vestibulo-ocular reflex (VOR) serves to stabilize the retinal image by producing eye rotations that counterbalance head rotations. Vestibular nuclei neurons are much more than a simple relay; their functions include multimodality integration, temporal signal processing, and adaptive plasticity. Vestibulo-Ocular Reflex reviews the empirical data, as well as control-theoretic and neural network models for the neural circuits that mediate the VOR. These perform diverse computations that include oculomotor command integration, temporal signal processing, temporal pattern generation, and experience-dependent plasticity.
Collicular Visuomotor Transformations for Gaze Control analyzes the role of superior colliculus in the control of the rapid movement, called a saccade, of the eyes toward a target. The article touches on afferent and efferent mapping, target selection, visuomotor transformations in motor error maps, remapping models, and coding of dynamic motor error. The theme of remapping is pursued in Dynamic Remapping, which distinguishes “one-shot” remapping (updating the internal representation in one operation to compensate for an entire movement) from a continuous remapping process based on the integration of a velocity signal or the relaxation of a recurrent network. In both cases, the problem amounts to moving a hill of activity in neuronal maps. The article uses data on arm movements as well as saccades. Models can be constrained by considering deficits that accompany localized lesions in humans. These data not only provide valuable insights into the nature of remappings but they might also help bridge the gap between behavior and single-cell responses. Pursuit Eye Movements takes us from saccadic “jumps” to those smooth eye movements involved in following a moving target. Current models of pursuit vary in their organization and in the features of pursuit that they are designed to reproduce. Three main types of model are “image motion” models, “target velocity” models, and models that address the role of prediction in pursuit. However, these models make no explicit reference to the neural structures that might be responsible. The article thus analyzes the neural pathways for pursuit, stressing the importance of both visual areas of the cerebral cortex and oculomotor regions of the cerebellum, to set goals for future modeling.
Imaging the Motor Brain shows that the behavioral form and context of a movement are important determinants of functional activity within cortical motor areas and the cerebellum, stressing that functional imaging of the human motor system requires one to study the interaction of neurological and cognitive processes with the biomechanical characteristics of the limb. Neuroimaging shows that multiple neural systems and their functional interactions are needed to successfully perform motor tasks, encode relevant information for motor learning, and update behavioral performance in real time. The article discusses how evidence from functional imaging studies provides insight into motor automaticity as well as the role of internal models in movement.
Two articles explore the way in which the rat charts the spatial structure of its environment, using both “landmark cues” and a sense of its head orientation with respect to some key aspects of its environment. Hippocampus: Spatial Models starts with the finding that single-unit recordings in freely moving rats have revealed “place cells” in fields CA3 and CA1 of the hippocampus, so called because their firing is restricted to small portions of the rat’s environment (the corresponding place fields), but the firing properties of place cells change when the rat is placed in a new environment. The article focuses on data and models for the role of place cell firing in the rat’s navigation (see “Cognitive Maps” for a less neurophysiological approach to the same general issues). Rodent Head Direction System focuses on head direction cells in a number of brain areas that fire maximally when the rat’s head is pointed in a specific preferred direction, with a gradual falloff in firing as the heading departs from that direction. Head direction is not a simple reflection of sensory stimuli since, for example, the neural coding can be updated when the animal turns in the dark. The authors analyze such phenomena using attractor networks.
The next six articles are concerned with reaching and grasping. Motor Cortex: Coding and Decoding of Directional Operations spells out the relation between the direction of reaching and changes in neuronal activity that have been established for several brain areas, including the motor cortex. The cells involved each have a broad tuning function, the peak of which denotes the “preferred” direction of the cell. A movement in a particular direction will engage a whole population of cells. It is found that the weighted vector sum of these neuronal preferences is a “population vector” that points in (close to) the direction of the movement for discrete movements in 2D and 3D space. Further observations link this population encoding to speed of movement as well as to preparation for movement. The present article addresses the question of how movement variables are encoded in the motor cortex and how this information could be used to drive a simulated actuator that mimics the primate arm. Arm and Hand Movement Control discusses some of the most prominent regularities of arm and hand control, and examines computational and neural network models designed to explain them. The analysis reveals the controversies engendered by competition between explanations sought on different levels—neural, biomechanical, perceptual, or computational. Although some topics, such as internal model control, have gained solid grounding, the importance of the dynamic properties of the musculoskeletal system in facilitating motor control, the role of real-time perceptual modulation of motor control, and the balance between dynamical systems models versus optimal control-based models are still seen as offering many open questions. Reaching Movements: Implications for Computational Models reviews a number of issues that are emerging from neurophysiological studies of motor control and stresses their implications for development of future models. Data on movement planning, trajectory generation, temporal features of cortical activity, and overlapping polymodal gradients are used to set challenges for computational models that will meet the demands of both functional competence and biological plausibility.
Eye-Hand Coordination in Reaching Movements focuses on possible mechanisms responsible for visually guiding the hand toward a point within the prehension space. Reaching at a visual target requires transformation of visual information about target position into a frame of reference suitable for the planning of hand movement. Accurate encoding of target location requires concomitant foveal and extraretinal signals. The most popular hypothesis to explain how trajectories are planned is that the trajectory is specified as a vector in the arm’s joint space, with joint angle variations controlled in a synergic way (temporal coupling). The motor command initially sent to the arm is based on an extrafoveal visual signal; at the end of the ocular saccade, the updated visual signal is used to adjust the ongoing trajectory. Because of consistent delays in sensorimotor loops, the rapid path corrections observed during reaching movements cannot be attributed to sensory information only but must rely on a “forward model” of arm dynamics. In any case, where this article focuses on how the hand is brought to a target, Grasping Movements: Visuomotor Transformations emphasizes the neural mechanisms that control the shaping of the hand itself to grasp an object, noting the crucial preshaping of the hand during reaching prior to grasping the object. The analysis emphasizes the cooperative computation of visual mechanisms in parietal cortex with motor mechanisms in premotor cortex to integrate sensing and corollary discharge throughout the movement. Cerebellum and Motor Control reviews a number of models of the role of the cerebellum in building “internal models” to improve motor skills. The article asserts that motor control and learning in the brain employ a modsular approach in which multiple controllers coexist, with each controller suitable for one or a small set of contexts. The basic idea is that, to select the appropriate controller or controllers at each moment, each of the multiple inverse models is augmented with a forward model that determines the responsibility each controller should assume during movement. This view is exemplified in the MOSAIC (MOdular Selection And Identification Control) model. Recent human brain imaging studies have started to accumulate evidence supporting multiple internal models of tools in the cerebellum. (One caveat: The article stresses the idea that the cerebellum provides complete motor controllers; other authors emphasize the idea that the cerebellum provides a corrective side path that learns how best to augment controllers located elsewhere in the brain.) Finally, Basal Ganglia reviews the structure of this system in terms of multiple loops, with special emphasis on those involved in skeletomotor and oculomotor functions. It also reviews the role of dopamine in motor learning and the mechanisms underlying Parkinson’s disease.
Brain-Computer Interfaces
Decision Support Systems and Expert Systems
Filtering, Adaptive
Forecasting
Kalman Filtering: Neural Implications
Prosthetics, Motor Control
Prosthetics, Neural
Prosthetics, Sensory Systems
The road map Robotics and Control Theory presents a number of applications of neural networks. Here we offer a representative (but by no means exhaustive) set of other applications, a list that can be augmented by the study of many other road maps. Examples include a variety of topics in vision and speech processing (see the road maps Vision and Linguistics and Speech Processing, respectively). As noted in the Preface, the discussion of applications of ANNs in areas from astronomy to steel making was a feature of the first edition of the Handbook that is not reproduced in the second edition.
Several articles review the various contributions of adaptive neural networks to signal processing. Filtering, Adaptive notes that adaptive filtering has found widespread use in noise canceling and noise reduction, channel equalization, cochannel signal separation, system identification, pattern recognition, fetal heart monitoring, and array processing. The parameters of an adaptive filter are adjusted to “learn” or track signal and system variations according to a task-specific performance criterion. The field of adaptive filtering was derived from work on neural networks and adaptive pattern recognition. An adaptive filter can be viewed as a signal combiner consisting of a set of adjustable weights (or coefficients represented by a polynomial) and an algorithm (learning rule) that updates these weights using the filter input and output, as well as other available signals. The filter may include internal signal feedback, whereby delayed versions of the output are used to generate the current output, and it may contain some nonlinear components. The single-layer perceptron is a well-known type of adaptive filter that has a binary output nonlinearity (see “Perceptrons, Adalines, and Backpropagation”). The article focuses on the most widely used adaptive filter architecture and describes in some detail two representative adaptive algorithms: the least-mean-square algorithm and the constant modulus algorithm. Kalman Filtering: Neural Implications then introduces Kalman filtering, a powerful idea rooted in modern control theory and adaptive signal processing. Under linear and Gaussian conditions, the Kalman filter produces a recursive estimate of the hidden state of a dynamic system, i.e., one that is updated with each subsequent (noisy) measurement of the observed system, with the estimate being optimum in the mean-square-error sense. The Kalman filter provides an indispensable tool for the design of automatic tracking and guidance systems, and an enabling technology for the design of recurrent multilayer perceptrons that can simulate any finite-state machine. In the context of neurobiology, Kalman filtering provides insights into visual recognition and motor control. Related applications are discussed in Forecasting. Neural nets, mostly of the standard backpropagation type, have been used with great success in many forecasting applications. This article looks at the use of neural nets for forecasting with particular attention to understanding when they perform better or worse than other technologies, showing how the success of neural networks in forecasting depends significantly on the characteristics of the process being forecast.
A decision support system is an information system that helps humans make a decision on a given problem, under given circumstances and constraints. Expert systems are information systems that contain expert knowledge for a particular problem area and perform inferences when new data are entered that may be partial or inexact. They provide a solution that is expected to be similar to the solution provided by experts in the field. Decision Support Systems and Expert Systems uses the collective term decision system to refer to either a decision support system or an expert system. The article discusses how neural networks can be employed in a decision system. Such systems help humans in their decision process and so should be comprehensible by humans. The article reviews results of connectionist-based decision systems. In particular, trainable knowledge-based neural networks can be used to accumulate both knowledge (rules) and data, building adaptive decision systems with incremental, on-line learning. (For further developments related to the construction of expert systems, see “Bayesian Networks” and the three articles on “Graphical Models.”)
Brain-Computer Interfaces discusses the use of on-line analysis of brainwaves to derive information about a subject’s mental state as a basis for driving some external action, such as selecting a letter from a virtual keyboard or moving a robotics device, providing an alternative communication and control channel that does not depend on the brain’s normal output pathway of peripheral nerves and muscles, which may be nonfunctional in some patients. The brainwave signals may be evoked potentials generated in response to external stimuli or components associated with spontaneous mental activity. Targets for current research include the extraction of local components of brain activity with fast dynamics that subjects can consciously control. The article reviews the challenge of developing classifiers that work while the subject operates a brain-actuated application, with ANNs providing robust approaches to on-line learning. These studies are complemented by a range of articles on prosthetics. Prosthetics, Neural provides an overview of the physical components that tend to be common to all neural prosthetic systems. It emphasizes the biophysical factors that constrain the sophistication of those interfaces. Electroneural interfaces for both stimulation of and recording from neural tissue are analyzed in terms of biophysics and electrochemistry. It is also shown how the design of practical neural prostheses must address the systems hardware issues of power and data management and packaging. Prosthetics, Sensory Systems focuses on sensory prostheses, in which information is collected by electronic sensors and delivered directly to the nervous system by electrical stimulation of pathways in or leading to the parts of the brain that normally process a given sensory modality. After assessing the amenability of all sensory modalities (hearing, vision, touch, proprioception, balance, smell, and taste) the article focuses on auditory and visual prostheses. The great success story has been with cochlear implants. Here the article reviews improved temporospatial representations of speech sounds, combined electrical and acoustic stimulation in patients with residual hearing, and psychophysical correlates of performance variability. Visual prostheses are still in their early days, with no general agreement on the most promising site to apply electrical stimulation to the visual pathways. The article reviews the cortical approach and the retinal approach. Finally, it is noted that since a prosthesis does not necessarily match natural neural encoding of a stimulus, the success of the prosthesis depends in part on the plasticity of the human brain as it remaps to accommodate this new class of signals. Prosthetics, Motor Control deals with the subset of neural prosthetic interfaces that employ electrical stimulation to alter the function of motor systems, either directly or indirectly. The article presents three clinical applications. Therapeutic electrical stimulation is electrically produced exercise in which the beneficial effect occurs primarily off-line as a result of trophic effects on muscles and perhaps the CNS; neuromodulatory stimulation is preprogrammed stimulation that directly triggers or modulates a function without ongoing control or feedback from the patient; and functional electrical stimulation (FES) provides precisely controlled muscle contractions that produce specific movements required by the patient to perform a task. The article describes subsystems for muscle stimulation, sensory feedback, sensorimotor regulation, control systems, and command signals, most of which are under development to improve on-line control of FES. Electrical stimulation of the nervous system is also being used to treat other disorders, including spinal cord stimulation to control pain and basal ganglia stimulation to control parkinsonian dyskinesias.
To close this road map, we note the importance of using special-purpose VLSI chips to gain the full efficiency of artificial neural network in various applications. Such chips are among the methods for implementation of neural networks discussed in the next road map, Implementation and Analysis.
Analog VLSI Implementations of Neural Networks
Biophysical Mechanisms in Neuronal Modeling
Brain Signal Analysis
Databases for Neuroscience
Digital VLSI for Neural Networks
GENESIS Simulation System
Neuroinformatics
Neuromorphic VLSI Circuits and Systems
NEURON Simulation Environment
Neurosimulation: Tools and Resources
NSL Neural Simulation Language
Photonic Implementations of Neurobiologically Inspired Networks
Programmable Neurocomputing Systems
Silicon Neurons
Statistical Parametric Mapping of Cortical Activity Patterns
Briefly, a neural network (whether an artificial neural network for technological application or a simulation of a biological neural network in computational neuroscience) can be implemented in three main ways: by programming a general-purpose electronic computer, by programming an electronic computer designed for neural net implementation, or by building a special-purpose device to emulate a particular network or parametric family of networks. We discuss these three approaches in turn, and then review a number of articles describing tools and methods for the analysis of brain signals and related activity.
Neurosimulation: Tools and Resources reviews neurosimulators, i.e., programs designed to reduce the time and effort required to build models of neurons and neural networks. A neurosimulator requires, at the very least, a highly developed interface, a scalable design (e.g., through parallel hardware), and extendibility with new neural network paradigms. The review includes programs for modeling networks of biological neurons as well as programs for kinetic modeling of intracellular signaling cascades and regulatory genetic networks but does not cover connectionist simulators. It provides a general picture of the capabilities of several neurosimulators, highlighting some of the best features of the various programs, and also describes ongoing efforts to increase compatibility among the various programs. Compatibility allows models built with one neurosimulator to be independently evaluated and extended by investigators using different programs, thereby reducing duplication of effort, and also allows models describing different levels of complexity (molecular, cellular, network) to be related to one another. The next three articles present some of the methods necessary for efficient simulation of detailed models of single neurons (see, e.g., the articles “Axonal Modeling” and “Dendritic Processing” in the Biological Neurons and Synapses road map). Biophysical Mechanisms in Neuronal Modeling is a primer on biophysically detailed compartmental models of single neurons (see the road map Biological Neurons and Synapses for a fuller précis), but contributes to the topic of neurosimulators by illustrating examples of model definitions using the Surf-Hippo Neuron Simulation System, providing a minimal syntax that facilitates model documentation and analysis. GENESIS Simulation System describes GENESIS (GEneral NEural SImulation System), which was developed to support “structurally realistic” simulations, computer-based implementations of models designed to capture the anatomical structure and physiological characteristics of the neural system of interest. GENESIS has been widely used for single-cell “compartmental” modeling but is also used for large network models, using libraries of ion channels and complete cell models, respectively. NEURON is a neurosimulator that was first developed for simulating empirically based models of biological neurons with extended geometry and biophysical mechanisms that are spatially nonuniform and kinetically complex. This functionality has been enhanced to include extracellular fields, linear circuits to emulate the effects of nonideal instrumentation, models of artificial (integrate-and-fire) neurons, and networks that can involve any combination of artificial and biological neuron models. NEURON Simulation Environment shows how these capabilities have been implemented so as to achieve computational efficiency while maintaining conceptual clarity, i.e., the knowledge that what has been instantiated in the computer model is an accurate representation of the user’s conceptual model. Where NEURON has been primarily used for detailed modeling of single neurons, NSL Neural Simulation Language provides methods for simulating very large networks of relatively simple (artificial or biological simulation) neurons. NSL (pronounced “Nissl”) models focus on modularity, a well-known software development strategy in dealing with large and complex systems. Full understanding of a system is gained both by simulating modules in isolation and by designing computer experiments that follow the dynamics of the interactions between the various modules. An NSL model can be described either by direct programming in NSLM, the NSL (compiled) Modeling language, or by using the Schematic Capture System (SCS), a visual programming interface to NSLM supporting the description of module assemblages. “Phase-Plane Analysis of Neural Nets” introduces the qualitative theory of differential equations in the plane for analyzing neural networks. Computational methods are a very powerful adjunct to this type of analysis. The article concludes with comments on numerical methods and software. Between them, the articles reviewed in this paragraph make clear the challenge of providing multilevel neurosimulation environments in which one can move effortlessly between the levels of schemas (functional decomposition of an overall behavior), large neural networks, detailed models of single neurons, and neurochemical models of synaptic plasticity. To be fully effective, such an environment will also need visualization tools, and the ability to access a database to provide experimental results for comparison with model-based predictions.
The next two articles address the digital, parallel implementation of neural networks. Digital VLSI for Neural Networks starts by looking at the differences between digital and analog design techniques, with a focus on analyzing cost-performance trade-offs in flexibility (Amdahl’s Law), and then considers the use of standard VLSI processors in parallel configurations for ANN emulation. The Adaptive Solutions CNAPS custom digital ANN processor is then discussed to convey a sense of some of the issues involved in designing digital structures for ANN emulation. Although this chip is no longer produced, it is still being used and provides a good vehicle for understanding the trade-offs inherent in emulating neural structures digitally. Finally, the article looks at field programmable gate array (FPGA) technology as a promising vehicle for digital implementation of ANNs. Programmable Neurocomputing Systems emphasizes that the design of specialized digital neurocomputers has exploited three items common to many neural (ANN) algorithms to improve cost/performance: the limited numeric precision required; the inherently high data parallelism, where the same operation is performed across large arrays of data; and communication patterns restricted enough to allow broadcast buses or unidirectional rings to support parallel execution of many common neural network algorithms. However, in the future, the work of commercial design teams to incorporate multimedia-style kernels into the workloads they consider during the design of new microprocessors will have as a by-product the ability to dramatically improve performance for ANN algorithms. This suggests that in the future there will be greatly reduced interest in special-purpose neurocomputers but much attention to software strategies to optimize ANN performance on commercially available microprocessors.
However, the above three assumptions are not so useful in the implementation of detailed “compartmental” models of neurons. Here, attention has been paid to the design of highly special-purpose analog VLSI circuits. Digital VLSI assigns a different circuit to each bit of information that is to be stored and processed. Each circuit is driven to the limit so that it settles into a 0-state or a 1-state, passing through a linear voltage-current regime to get from one saturation state to the other. Thus, if a synaptic weight is to be stored with eight-bit precision in digital VLSI, it requires eight such circuits. By contrast, the linear regime of a single circuit element on a VLSI chip can store data with about three bits of precision with far less “real estate” on the chip, and with far less power loss. The price, of course, is that precision cannot be guaranteed on the same scale as for digital circuits, but in many neural net applications, analog precision is more than adequate. Analog VLSI Implementations of Neural Networks provides an overview of the implementation of circuitry in analog VLSI, and then summarizes a number of technological implementations of such analog chips for ANNs. The article introduces the difference between the constraints imposed by the biological and silicon media and emphasizes that letting the silicon medium constrain the design of a system results in more efficient methods of computation. Special emphasis is given to five properties of a silicon synapse that are essential for building large-scale adaptive analog VLSI synaptic arrays. This article focuses on building neural network integrated circuits (IC), and especially on building connectionist neural network models. Silicon Neurons takes the same implementation methodology into the realm of computational neuroscience. Biological neural networks are difficult to model because they are composed of large numbers of nonlinear elements and have a wide range of time constants. Simulation on a general-purpose digital computer slows dramatically as the number and coupling of elements increase. By contrast, silicon neurons operate in real time, and the speed of the network is independent of the number of neurons or their coupling. On the other hand, high connectivity still poses problems in 2D chip layouts, and the design of special-purpose hardware is a significant investment, particularly if it is analog hardware, since analog VLSI still lacks a general set of easy-to-use design tools. In any case, Neuromorphic VLSI Circuits and Systems charts the virtues of using analog VLSI to build “neuromorphic” chips, i.e., chips whose design is based on the structure of actual biological neural networks. Biological systems excel at sensory perception, motor control, and sensorimotor coordination by sustaining high computational throughput with minimal energy consumption. Neuromorphic VLSI systems employ distributed and parallel representations and computation akin to those found in their biological counterparts. The high levels of system integration offered in VLSI technology make it attractive for the implementation of highly complex artificial neuronal systems, even though the physics of the liquid-crystalline state of biological structures is different from the physics of the solid-state silicon technologies. The article provides a basic foundation in device physics and presents a set of specific circuits that implement certain essential functions that exemplify the breadth possible within this design paradigm. However, VLSI-based neural networks have difficulty in scaling up or interconnecting multiple neural chips to incorporate large numbers of neuron units in highly interconnected architectures without significantly increasing the computational time. This motivates the use of optical interconnections. The success of optic fibers as media for telecommunications has been complemented by the use of holograms and spatial light modulators as mechanisms for storing and processing information via patterns of light (photonics) rather than patterns of electrons (electronics). The current state of photonic approaches to neural network implementation is charted in Photonic Implementations of Neurobiologically Inspired Networks, which provides a perspective on the use of holography as a technique for building adaptive connection matrices for ANNs, as well as earlier discussions of holography as a metaphor for the working of associative memory in actual brains. In photonic implementation of neurobiologically inspired networks, optical (free-space or through-substrate) techniques enable an increase in the number of neuron units and the interconnection complexity by using the off-chip (third) dimension. This merging of optical and photonic devices with electronic circuitry provides additional features such as parallel weight implementation, adaptation, and modular scalability.
The remaining articles provide a number of perspectives on the analysis of data on the brain.
Brain Signal Analysis reviews applications of ANNs to brain signal analysis, including analysis of the EEG and MEG, the electromyogram (EMG), and computed tomographic (CT) images and magnetic resonance (MR) brain images, and to series of functional MR brain images (fMRI). Since most medical signals usually are not produced by variations in a single variable or factor, many medical problems, particularly those involving decision making, must involve a multifactorial decision process. In these cases, changing one variable at a time to find the best solution may never reach the desired objective, whereas multifactorial ANN approaches may be more successful. The review is organized according to the nature of brain signals to be analyzed and the role that ANNs play in the applications.
Statistical Parametric Mapping of Cortical Activity Patterns describes the construction of statistical maps to test hypotheses about regionally specific effects like “activations” during brain imaging studies. Statistical parametric maps (SPMs) are image processes with voxel values that are, under the null hypothesis, distributed according to a known probability density function (usually Student’s T or F distributions), analyzing each and every voxel using any standard (univariate) statistical test. The resulting statistical parameters are assembled into an image, the SPM. SPM{T} refers to an SPM comprising T statistics; similarly, SPM{F} denotes an SPM of F statistics. SPMs are interpreted as spatially extended statistical processes by referring to the probabilistic behavior of stationary Gaussian fields. Unlikely excursions of the SPM are interpreted as regionally specific effects, attributable to the sensorimotor or cognitive process that has been manipulated experimentally.
Neuroinformatics presents an integrated view of neuroinformatics that combines tools for the storage and analysis of neuroscience data with the use of computational models in structuring masses of such data. In Europe, neuroinformatics is a term used to encompass the full range of computational approaches to brain theory and neural networks. In the United States, some people use the term neuroinformatics solely to refer to databases in neuroscience. Taking the perspective of the Handbook, this article sees the key challenge for neuroinformatics to be to integrate insights from synthetic data obtained from running a model with data obtained empirically from studying the animal or human brain. The problem is that the data, and thus the models, of neuroscience are so diverse. Neuroscience integrates anatomy, behavior, physiology, and chemistry, and studies levels from molecules to compartments and neurons up to biological neural networks and on to the behavior of organisms. The article thus presents an architecture for a federation of databases of empirical neuroscientific data in which results from diverse laboratories can be integrated. It further advocates a cumulative approach to modeling in neuroscience that facilitates the reusability (with appropriate changes) of modules within current neural models, with the pattern of re-use fully documented and tightly constrained by the linkage with this federation of databases. Databases for Neuroscience then focuses on the issues in constructing such databases. In order to be able to integrate such diverse sources, the various communities within the neurosciences must begin to develop standards for their community’s data. Neuroscientists use many different and incompatible data formats that do not allow for the free exchange of data, and the article stresses the need for standards for the description of the actual data (i.e., a formalized description of the metadata), possibly using extensible markup language (XML) technologies. (On a related theme, Neurosimulation: Tools and Resources examines two of the enabling neurosimulation technologies that will allow modelers to compare and modify models, verify one another’s simulations, and extend models with their own tools.) One possible solution to integrating data from sources with heterogeneous data and representation is to extend the conventional wrapper-mediator architecture with domain-specific knowledge. The article concludes with analysis of a specific database of brain images (the fMRI Data Center) and a comprehensive table of neuroscience databases constructed to date.
| |