288 pp. per issue
6 x 9, illustrated
2014 Impact factor:

Neural Computation

January 1995, Vol. 7, No. 1, Pages 144-157
(doi: 10.1162/neco.1995.7.1.144)
© 1995 Massachusetts Institute of Technology
Empirical Risk Minimization versus Maximum-Likelihood Estimation: A Case Study
Article PDF (678.69 KB)

We study the interaction between input distributions, learning algorithms, and finite sample sizes in the case of learning classification tasks. Focusing on the case of normal input distributions, we use statistical mechanics techniques to calculate the empirical and expected (or generalization) errors for several well-known algorithms learning the weights of a single-layer perceptron. In the case of spherically symmetric distributions within each class we find that the simple Hebb rule, corresponding to maximum-likelihood parameter estimation, outperforms the other more complex algorithms, based on error minimization. Moreover, we show that in the regime where the overlap between the classes is large, algorithms with low empirical error do worse in terms of generalization, a phenomenon known as overtraining.