| |
Abstract:
In previous work on ``transformed mixtures of Gaussians'' and
``transformed hidden Markov models'', we showed how the EM
algorithm in a discrete latent variable model can be used to
jointly normalize data (e.g., center images, pitch-normalize
spectrograms) and learn a mixture model of the normalized data.
The only input to the algorithm is the data, a list of possible
transformations, and the number of clusters to find. The main
criticism of this work was that the exhaustive computation of the
posterior probabilities over transformations would make scaling
up to large feature vectors and large sets of transformations
intractable. Here, we describe how a tremendous speed-up is
acheived through the use of a variational technique for
decoupling transformations, and a fast Fourier transform method
for computing posterior probabilities. For
N
×
N
images, learning
C
clusters under
N
rotations,
N
scales,
N
x
-translations and
N
y
-translations takes only (
C
+ 2 log
N
)
N
2
scalar operations per iteration. In contrast, the original
algorithm takes
C
N
6
operations to account for these transformations. We give results
on learning a 4-component mixture model from a video sequence
with frames of size 320 × 240. The model accounts for 360
rotations and 76,800 translations. Each iteration of EM takes
only 10 seconds per frame in MATLAB, which is over 5
million
times faster than the original algorithm.
|