| |
Abstract:
We present a Monte Carlo algorithm for learning to act
optimally in partially observable Markov decision processes
(POMDPs). Our approach uses importance sampling for representing
beliefs, and Monte Carlo approximation for belief revision.
Reinforcement learning (value iteration) is employed to learn value
functions over belief functions, and a sample-based version of
nearest neighbor is used to generalize across states. Our approach
departs from previous work in the POMDP field in that it can handle
real-valued state spaces. Initial empirical results suggest that
our approach may work well in practical applications.
|