| |
Abstract:
Learning curves for Gaussian process regression are well
understood when the `student' model happens to match the
`teacher' (true data generation process). I derive approximations
to the learning curves for the more generic case of
mismatched models
, and find very rich behaviour: For large input space
dimensionality, where the results become exact, there are
universal (student-independent) plateaux in the learning curve,
with transitions in between that can exhibit arbitrarily many
over-fitting maxima; over-fitting can occur even if the student
estimates the teacher noise level correctly. In lower dimensions,
plateaux also appear, and the learning curve remains dependent on
the mismatch between student and teacher even in the asymptotic
limit of a large number of training examples. Learning with
excessively strong smoothness assumptions can be particularly
dangerous: For example, a student with a standard radial basis
function covariance function will learn a rougher teacher
function only logarithmically slowly. All predictions are
confirmed by simulations.
|