| |
Abstract:
The hierarchical representation of data has various
applications in domains such as data mining, machine vision, or
information retrieval. In this paper we introduce an extension of
the Expectation-Maximization (EM) algorithm that learns mixture
hierarchies in a computationally efficient manner. Efficiency is
achieved by progressing in a bottom-up fashion, i.e. by clustering
the mixture components of a given level in the hierarchy to obtain
those of the level above. This clustering requires only knowledge
of the mixture parameters, there being no need to resort to
intermediate samples. In addition to practical applications, the
algorithm allows a new interpretation of EM that makes clear the
relationship with non-parametric kernel-based estimation methods,
provides explicit control over the trade-off between the bias and
variance of EM estimates, and offers new insights about the
behavior of deterministic annealing methods commonly used with EM
to escape local minima of the likelihood.
|