MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Learning from infinite data in finite time

 Pedro Domingos and Geoff Hulten
  
 

Abstract:

We propose the following general method for scaling learning algorithms to arbitrarily large data sets. Consider the model $M_{\vecn}$ learned by the algorithm using $n_i$ examples in step i ($\vecn = (n_1, \ldots, n_m)$), and the model $M_\infty$ that would be learned using infinite examples. Upper-bound the loss $L(M_{\vecn}, M_\infty) between them as a function of $\vecn$, and then minimize the algorithm's time complexity $f(\vecn)$ subject to the constraint that $L(M_\infty, M_{\vecn})$ be at most $\epsilon$ with probability at most $\delta$. We apply this method to the EM algorithm for mixtures of Gaussians. Preliminary experiments on a series of large data sets provide evidence of the potential of this approach.

 
 


© 2010 The MIT Press
MIT Logo