| |
Abstract:
Bayesian predictions are stochastic just like predictions of
any other inference scheme that generalizes from a finite sample.
While a simple variational argument shows that Bayes averaging is
generalization optimal given that the prior matches the teacher
parameter distribution, the situation is less clear if the teacher
distribution is unknown. I define a class of averaging procedures,
the temperated likelihoods, including both Bayes averaging with a
uniform prior and maximum likelihood estimation as special cases. I
show that Bayes is generalization optimal in this family for any
teacher distribution for two learning problems that are
analytically tractable: learning the mean parameter of a Gaussian
and asymptotics of smooth learners.
|