Paper : Convergence of TD(lambda)

Peter Dayan dayan at helmholtz.sdsc.edu
Wed Mar 3 20:42:24 EST 1993


A postscript version of the following paper has been placed in the
neuroprose archive. It has been submitted to Machine Learning, and
comments/questions/refutations are eagerly solicited.

Hard-copies are not available.

*****************************************************************

	       TD(lambda) Converges with Probability 1
				   
		 Peter Dayan and Terrence J Sejnowski
		       CNL, The Salk Institute
		    10010 North Torrey Pines Road
			  La Jolla, CA 92037


The methods of temporal differences allow agents to learn accurate
predictions about stationary stochastic future outcomes. The learning
is effectively stochastic approximation based on samples extracted
from the process generating an agent's future.

Sutton has proved that for a special case of temporal differences, the
expected values of the predictions converge to their correct values,
as larger samples are taken, and this proof has been extended to the
case of general lambda.  This paper proves the stronger result that
the predictions of a slightly modified form of temporal difference
learning converge with probability one, and shows how to quantify the
rate of convergence.

*****************************************************************

----------------------------------------------------------------
FTP INSTRUCTIONS

"Getps dayan.tdl.ps.Z" if you have the shell script, or

     unix% ftp archive.cis.ohio-state.edu (or 128.146.8.52)
     Name: anonymous
     Password: neuron
     ftp> cd pub/neuroprose
     ftp> binary
     ftp> get dayan.tdl.ps.Z
     ftp> bye
     unix% zcat dayan.tdl.ps.Z | lpr
----------------------------------------------------------------



More information about the Connectionists mailing list