| |
Abstract:
Ever since Pearl's probability propagation algorithm in
graphs with cycles was shown to produce excellent results for
error-correcting decoding a few years ago, we have been curious
about whether local probability propagation could be used
successfully for machine learning. One of the simplest adaptive
models is the factor analyzer, which is a two-layer network that
models bottom layer sensory inputs as a linear combination of top
layer factors plus independent Gaussian sensor noise. The number of
bottom-up/top-down iterations needed to exactly infer the factors
given a network and an input is equal to the number of factors in
the top layer. In online learning, this iterative procedure must be
reinitialized upon each pattern presentation and so learning
becomes prohibitively slow in big networks, such as those used for
face recognition and for large-scale models of the cortex. We show
that local probability propagation in the factor analyzer network
usually takes just a few iterations to perform accurate inference,
even in networks with 320 sensors and 80 factors. We derive an
expression for the algorithm's fixed point and show that this fixed
point matches the exact solution in a variety of networks, even
when the fixed point is unstable. We also show that this method can
be used successfully to perform inference for approximate EM and we
give results on face recognition.
|