MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Policy Search via Density Estimation

 Andrew Y. Ng, Ronald Parr and Daphne Koller
  
 

Abstract:
We propose a new approach to the problem of searching a space of stochastic controllers for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP). Following several other authors, our approach is based on searching in parameterized families of policies (for example, via gradient descent) to optimize solution quality. But rather than trying to estimate the values and derivatives of a policy directly, we do so indirectly using an estimate for the probability density that the policy induces on the states at the different points in time. This enables our algorithms to exploit the many techniques for efficient and robust approximate density propagation in stochastic systems. We show how our techniques can be applied both to deterministic propagation schemes (where the MDP's dynamics are given explicitly in compact form,) and to stochastic propagation schemes (where we have access only to a generative model, or simulator, of the MDP). We present empirical results for both of these variants on complex problems --- one with a continuous state space and complex nonlinear dynamics, and one with a large discrete state space and a transition model specified using a dynamic Bayesian network.

 
 


© 2010 The MIT Press
MIT Logo