| |
Abstract:
Psychophysical and physiological evidence shows that sound
localization of acoustic signals is strongly influenced by their
synchrony with visual signals. This effect, known as ventriloquism,
is at work when sound coming from the side of a TV set feels as if
it were coming from the mouth of the actors. The ventriloquism
effect suggests that there is important information about sound
location encoded in the synchrony between the audio and video
signals. In spite of this evidence, audiovisual synchrony is rarely
used as a source of information in computer vision tasks. In this
paper we explore the use of audio visual synchrony to locate sound
sources. We developed a system that searches for regions of the
visual landscape that correlate highly with the acoustic signals
and tags them as likely to contain an acoustic source. We discuss
our experience implementing the system, present results on a
speaker localization task and discuss potential applications of the
approach.
|