Monthly
288 pp. per issue
6 x 9, illustrated
ISSN
0899-7667
E-ISSN
1530-888X
2014 Impact factor:
2.21

Neural Computation

November 2011, Vol. 23, No. 11, Pages 2798-2832
(doi: 10.1162/NECO_a_00199)
© 2011 Massachusetts Institute of Technology
Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
Article PDF (1.27 MB)
Abstract

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009.)