| |
Abstract:
We propose a new approach to the problem of searching a space
of stochastic controllers for a Markov decision process (MDP) or a
partially observable Markov decision process (POMDP). Following
several other authors, our approach is based on searching in
parameterized families of policies (for example, via gradient
descent) to optimize solution quality. But rather than trying to
estimate the values and derivatives of a policy directly, we do so
indirectly using an estimate for the probability density that the
policy induces on the states at the different points in time. This
enables our algorithms to exploit the many techniques for
efficient
and robust approximate density propagation in stochastic systems.
We show how our techniques can be applied both to deterministic
propagation schemes (where the MDP's dynamics are given explicitly
in compact form,) and to stochastic propagation schemes (where we
have access only to a generative model, or simulator, of the MDP).
We present empirical results for both of these variants on complex
problems --- one with a continuous state space and complex
nonlinear dynamics, and one with a large discrete state space and a
transition model specified using a dynamic Bayesian network.
|