| |
Abstract:
Reinforcement learning in nonstationary environments is
generally regarded as an important and yet difficult problem. This
paper partially addresses the problem by formalizing a subclass of
nonstationary environments. The environment model, called
hidden-mode Markov decision process (HM-MDP), assumes that
environmental changes are always confined to a small number of
hidden modes. A mode basically indexes a Markov decision process
(MDP) and evolves with time according to a Markov chain. While
HM-MDP is a special case of partially observable Markov decision
processes (POMDP), modeling an HM-MDP environment via the more
general POMDP model unnecessarily increases the problem complexity.
A variant of the Baum-Welch algorithm is developed for model
learning requiring less data and time.
|