TR available

Tue Dec 17 16:11:23 EST 1991

The following technical report is now available. (This is a
long version of the paper to appear in the next NIPS proceedings.)
To obtain a copy, send a message to "tesauro at watson.ibm.com" and
be sure to include your PHYSICAL mail address.

       Practical Issues in Temporal Difference Learning

                      Gerald Tesauro
            IBM Thomas J. Watson Research Center
         PO Box 704, Yorktown Heights, NY 10598 USA

Abstract:
This paper examines whether temporal difference methods
for training connectionist networks, such as Suttons's
TD($\lambda$) algorithm, can be successfully applied to
complex real-world problems.  A number of important practical
issues are identified and discussed from a general theoretical
perspective.  These practical issues are then examined in
the context of a case study in which TD($\lambda$)
is applied to learning the game of backgammon from the outcome of
self-play.  This is apparently the first application of this
algorithm to a complex nontrivial task.  It is found that,
with zero knowledge built in, the network is able to learn from
scratch to play the entire game at a fairly strong
intermediate level of performance,
which is clearly better than conventional commercial programs,
and which in fact surpasses comparable networks trained
on a massive human expert data set.
This indicates that TD learning may work better in practice
than one would expect based on current theory, and it suggests
that further analysis of TD methods, as well as applications
in other complex domains, may be worth investigating.