| |
Abstract:
This paper describes some of the interactions of model
learning algorithms and planning algorithms we have found in
exploring model-based reinforcement learning. The paper focuses on
how local trajectory optimizers can be used effectively with
learned nonparametric models. We find that trajectory planners that
are fully consistent with the learned model often have difficulty
finding reasonable plans in the early stages of learning.
Trajectory planners that balance obeying the learned model with
minimizing cost often do better, even if the plan is not fully
consistent with the learned model.
|