MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Very Fast EM-based Mixture Model Clustering using Multiresolution kd-trees

 Andrew W. Moore
  
 

Abstract:
Clustering is important in many fields including manufacturing, biology, finance, and astronomy. Mixture models are a popular approach due to their statistical foundations, and EM is a very popular method for finding mixture models. EM, however, requires many accesses of the data, and thus has been dismissed as impractical for data mining of enormous datasets. We present a new algorithm, based on the multiresolution kd-trees of our earlier work, which dramatically reduces the cost of EM-based clustering, with savings rising linearly with the number of datapoints. Although presented here for maximum likelihood estimation of Gaussian mixture models, it is also applicable to non-Gaussian models (provided class densities are monotonic in Mahalanobis distance), mixed categorical/numeric clusters, and Bayesian methods such as Autoclass.

 
 


© 2010 The MIT Press
MIT Logo