| |
Abstract:
We discuss the problem of catastrophic fusion in multimodal
recognition systems. This problem arises in systems that need to
fuse different channels in non-stationary environments. Practice
shows that when recognition modules within each modality are tested
in contexts inconsistent with their assumptions, their influence on
the fused product tends to increase, with catastrophic results. We
explore a principled solution to this problem based upon Bayesian
ideas of competitive models and inference robustification: each
sensory channel is provided with simple white-noise context models,
and the perceptual hypothesis and context are jointly estimated.
Consequently, context deviations are interpreted as changes in
white noise contamination strength, automatically adjusting the
influence of the module. The approach is tested on a fixed lexicon
automatic audiovisual speech recognition problem with very good
results.
|