| |
Abstract:
We present a simple approach for computing reasonable policies
for factored Markov decision processes (MDPs), when the optimal
value function can be approximated by a compact linear form. Our
method is based on solving a single linear program that
approximates the best linear fit to the optimal value function.
By applying an efficient constraint generation procedure we
obtain an iterative solution method that tackles concise linear
programs. This direct linear programming approach experimentally
yields a significant reduction in computation time over
approximate value- and policy-iteration methods (sometimes
reducing several hours to a few seconds). However, the quality of
the solutions produced by linear programming is weaker -- usually
about twice the approximation error for the same approximating
class. Nevertheless, the speed advantage allows one to use larger
approximation classes to achieve similar error in reasonable
time.
|