Monthly
288 pp. per issue
6 x 9, illustrated
ISSN
0899-7667
E-ISSN
1530-888X
2014 Impact factor:
2.21

Neural Computation

March 1995, Vol. 7, No. 2, Pages 270-279
(doi: 10.1162/neco.1995.7.2.270)
© 1995 Massachusetts Institute of Technology
A Counterexample to Temporal Differences Learning
Article PDF (402.61 KB)
Abstract

Sutton's TD(λ) method aims to provide a representation of the cost function in an absorbing Markov chain with transition costs. A simple example is given where the representation obtained depends on λ. For λ = 1 the representation is optimal with respect to a least-squares error criterion, but as λ decreases toward 0 the representation becomes progressively worse and, in some cases, very poor. The example suggests a need to understand better the circumstances under which TD(0) and Q-learning obtain satisfactory neural network-based compact representations of the cost function. A variation of TD(0) is also given, which performs better on the example.