|
Abstract:
This paper is concerned with the problem of Reinforcement
Learning (RL) for continuous state space and time stochastic
control problems. We state the Hamilton-Jacobi-Bellman equation
satisfied by the value function and use a Finite-Difference method
for designing a convergent approximation scheme. Then we propose an
RL algorithm based on this scheme and prove its convergence to the
optimal solution.
|