| |
Abstract:
We describe a Reinforcement Learning algorithm for partially
observable environments using short-term memory, which we call
BLHT. Since BLHT learns a stochastic model based on Bayesian
Learning, the overfitting problem is reasonably solved. Moreover,
BLHT has an efficient implementation. This paper shows that the
model learned by BLHT converges to one which provides the most
accurate predictions of percepts and rewards, given short-term
memory.
|