| |
Abstract:
We study on-line generalized linear regression with
multidimensional outputs, i.e., neural networks with multiple
output nodes but no hidden nodes. We allow at the final layer
transfer functions such as the softmax function that need to
consider the linear activations to all the output neurons. We also
use a parameterization function which transforms parameter vectors
maintained by the algorithm into the actual weights. The on-line
algorithm we consider updates the parameters in an additive manner,
analogous to the delta rule, but because the actual weights are
obtained via the possibly nonlinear parameterization function they
may behave in a very different manner. Our approach is based on
applying the notion of a matching loss function in two different
contexts. First, we measure the loss of the algorithm in terms of
the loss that matches the transfer function used to produce the
outputs. Second, the loss function that matches the
parameterization function can be used both as a measure of distance
between models in motivating the update rule of the algorithm and
as a potential function in analyzing its relative performance
compared to an arbitrary fixed model. As a result, we have a
unified treatment that generalizes earlier results for the Gradient
Descent and Exponentiated Gradient algorithms to multidimensional
outputs, including multiclass logistic regression.
|