|
Abstract:
A policy iteration algorithm for partially observable Markov
decision processes is described that is simpler and more efficient
than an earlier policy iteration algorithm of Sondik. The key
simplification is the representation of a policy as a finite-state
controller. The dynamic-programming update used in the policy
improvement step is interpreted as the transformation of a
finite-state controller into an improved finite-state controller.
Empirical testing shows that this policy iteration algorithm
outperforms value iteration on a range of examples.
|