Paper : Convergence of TD(lambda)
Peter Dayan
dayan at helmholtz.sdsc.edu
Wed Mar 3 20:42:24 EST 1993
A postscript version of the following paper has been placed in the
neuroprose archive. It has been submitted to Machine Learning, and
comments/questions/refutations are eagerly solicited.
Hard-copies are not available.
*****************************************************************
TD(lambda) Converges with Probability 1
Peter Dayan and Terrence J Sejnowski
CNL, The Salk Institute
10010 North Torrey Pines Road
La Jolla, CA 92037
The methods of temporal differences allow agents to learn accurate
predictions about stationary stochastic future outcomes. The learning
is effectively stochastic approximation based on samples extracted
from the process generating an agent's future.
Sutton has proved that for a special case of temporal differences, the
expected values of the predictions converge to their correct values,
as larger samples are taken, and this proof has been extended to the
case of general lambda. This paper proves the stronger result that
the predictions of a slightly modified form of temporal difference
learning converge with probability one, and shows how to quantify the
rate of convergence.
*****************************************************************
----------------------------------------------------------------
FTP INSTRUCTIONS
"Getps dayan.tdl.ps.Z" if you have the shell script, or
unix% ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: neuron
ftp> cd pub/neuroprose
ftp> binary
ftp> get dayan.tdl.ps.Z
ftp> bye
unix% zcat dayan.tdl.ps.Z | lpr
----------------------------------------------------------------
More information about the Connectionists
mailing list