| |
Abstract:
Unsupervised learning algorithms are designed to extract
structure from data samples. Reliable and robust inference requires
a guarantee that extracted structures are typical for the data
source, i.e., similar structures have to be infered from a second
sample set of the same data source. The overfitting phenomenon in
maximum entropy based annealing algorithms is exemplarily studied
for a class of histogram clustering models. Bernstein's inequality
for large deviations is used to determine the maximal achievable
approximation quality parameterized by a minimal temperature. Monte
Carlo simulations support the proposed model selection criterion by
finite temperature annealing.
|