On Step-Size and Bias in TD Learning (Paper Available)

Rich Sutton sutton at gte.com
Tue Aug 2 15:54:04 EDT 1994


The following paper appeared in the proceeding of the 1994 Yale Workshop on
Adaptive and Learning Systems, 1994.  ftp instructions follow.


        ON STEP-SIZE AND BIAS IN TEMPORAL-DIFFERENCE LEARNING

           by Richard S. Sutton and Satinder P. Singh (MIT)
                                       
We present results for three new algorithms for setting the step-size
parameters, alpha and lambda, of temporal-difference learning methods such
as TD(lambda).  The overall task is that of learning to predict the outcome
of an unknown Markov chain based on repeated observations of its state
trajectories. The new algorithms select step-size parameters online in such
a way as to eliminate the bias normally inherent in temporal-difference
methods. We compare our algorithms with conventional Monte Carlo methods.
Monte Carlo methods have a natural way of setting the step size: for each
state s they use a step size of  1/n(s), where n(s) is the number of times
state s has been visited.  We seek and come close to achieving comparable
step-size algorithms for TD(lambda). One new algorithm uses a lambda=1/n(s)
schedule to achieve the same effect as processing a state backwards with
TD(0), but remains completely incremental. Another algorithm uses a lambda
at each time equal to the estimated transition probability of the current
transition. We present empirical results showing improvement in convergence
rate over Monte Carlo methods and conventional TD(lambda).  A limitation of
our results at present is that they apply only to tasks whose state
trajectories do not contain cycles.  

================================================================
unix> ftp envy.cs.umass.edu
 
Name: anonymous
Password: [your ID]
ftp> cd pub/singh
ftp> binary
ftp> get Yale94.ps.Z
ftp> bye
 
unix> uncompress Yale94.ps.Z
unix> [your command to print PostScript file] Yale94.ps
===============================================================
The paper is six pages long.  The file is about 130K.

FTP-host: envy.cs.umass.edu
FTP-filename: /pub/singh/Yale94.ps.Z




More information about the Connectionists mailing list