| |
Abstract:
We investigate a learning algorithm for the classification of
nonnegative data by mixture models. Multiplicative update rules
are derived that directly optimize the performance of these
models as classifiers. The update rules have a simple closed form
and an intuitive appeal. Our algorithm retains the main virtues
of the Expectation-Maximization (EM) algorithm -- its guarantee
of monotonic improvement, and its absence of tuning parameters --
with the added advantage of optimizing a discriminative objective
function. The algorithm reduces as a special case to the method
of generalized iterative scaling for log-linear models. The
learning rate of the algorithm is controlled by the sparseness of
the training data. We use the method of nonnegative matrix
factorization (NMF) to discover sparse distributed
representations of the data. This form of feature selection
greatly accelerates learning and makes the algorithm practical on
large problems. Experiments show that discriminatively trained
mixture models lead to much better classification than comparably
sized models trained by EM.
|