MIT CogNet, The Brain Sciences ConnectionFrom the MIT Press, Link to Online Catalog
SPARC Communities
Subscriber : Stanford University Libraries » LOG IN

space

Powered By Google 
Advanced Search

 

Convergence of optimistic and incremental Q-learning

 Eyal Even-Dar and Yishay Mansour
  
 

Abstract:

We show the convergence of two deterministic variants of Q-learning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an ε-optimal policy. The second is a new and novel algorithm incremental Q-learning , which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm can be viewed as derandomization of the ε-greedy Q-learning.

 
 


© 2010 The MIT Press
MIT Logo