| |
Abstract:
Abstract: Humans can easily reach for visual and auditory
targets despite the fact that these modalities use distinct frames
of reference to encode the location of objects (eye-centered for
vision, head-centered for audition). How does the brain solve this
problem? The traditional solution involves a remapping of the
sensory targets into a reference frame shared across modalities in
either joint-centered coordinates (the motor coordinates for
reaching) or possibly eye-centered coordinates. This predicts the
existence of multimodal neurons with spatially invariant receptive
fields in the shared reference frame. There is little support for
this prediction. Instead the vast majority of receptive fields are
gain modulated and/or partially shifted by posture signals. It is
sometimes argued that this is nevertheless a neural representation
of a common reference frame; the lack of invariance being simply
the result of the notorious sloppiness of biological neural
circuits. We argue instead that these response properties are
precisely what is expected from a network solving three important
problems simultaneously: 1- sensory motor transformations from any
modality to any behavior, 2- multisensory integration and
prediction, 3- motor feedback to sensory perception. Our argument
is based on a neural network model which solves these three
problems. Its units integrate multiple sensory and motor signals
using non-invariant (partially shifting) and gain modulated
receptive fields. This results in an intermediate representation of
space that cannot be reduced to one frame of reference.
|