| |
Abstract:
We have discovered a new scheme to represent the Fisher
information matrix of a stochastic multi-layer perceptron. Based on
this scheme, we have designed an algorithm to compute the inverse
of the Fisher information matrix. When the input dimension
n
is much larger than the number of hidden neurons, the complexity
of this algorithm is of order O(n
2
) while the complexity of conventional algorithms for the same
purpose is of order O(n
3
). The inverse of the Fisher information matrix is used in the
natural gradient descent algorithm to train single-layer or
multi-layer perceptrons. It is confirmed by simulation that the
natural gradient descent learning rule is not only efficient but
also robust.
|