MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

A natural policy gradient

 Sham Kakade
  
 

Abstract:

We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.

References

[9] R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Neural Information Processing Systems , 13, 2000.

 
 


© 2010 The MIT Press
MIT Logo