| |
Abstract:
We present three ways of combining linear programming with the
kernel trick to find value function approximations for
reinforcement learning. One formulation is based on SVM
regression; the second is based on the Bellman equation; and the
third seeks only to ensure that good moves have an advantage over
bad moves. All formulations attempt to minimize the number of
support vectors while fitting the data. Experiments in a
difficult, synthetic maze problem show that all three
formulations give excellent performance, but the advantage
formulation is much easier to train. Unlike policy gradient
methods, the kernel methods described here can easily adjust the
complexity of the function approximator to fit the complexity of
the value function.
|