| |
Abstract:
We consider the problem of learning to attain multiple goals
in a dynamic environment, which is initially unknown. In
addition, the environment may contain arbitrarily varying
elements related to actions of other agents or to non-stationary
moves of Nature. This problem is modelled as a stochastic
(Markov) game between the learning agent and an arbitrary player,
with a vector-valued reward function. The objective of the
learning agent is to have its long-term average reward vector
belong to a given target set. We devise an algorithm for
achieving this task, which is based on the theory of
approachability for stochastic games. This algorithm combines, in
an appropriate way, a finite set of standard, scalar-reward
learning algorithms. Sufficient conditions are given for the
convergence of the learning algorithm to a general target set.
The specialization of these results to the single-controller
Markov decision problem are discussed as well.
|