Paper on TD convergence available
Michael J. Kearns
mkearns at research.att.com
Wed Sep 1 10:41:36 EDT 1999
The following paper is now available at
http://www.research.att.com/~mkearns/papers/tdlambda.ps.Z
``Bias-Variance'' Error Bounds for Temporal Difference Updates
Michael Kearns
Satinder Singh
AT&T Labs
We give the first rigorous upper bounds on the error of temporal difference ($\td$)
algorithms for policy evaluation as a function of the amount of experience.
These upper bounds prove exponentially fast convergence, with both the rate of
convergence and the asymptote strongly dependent on the length of the
backups $k$ or the parameter $\lambda$.
Our bounds give formal verification to
the long-standing intuition that $\td$ methods
are subject to a ``bias-variance'' trade-off, and they lead to
schedules for $k$ and $\lambda$ that are predicted to be better than any
fixed values for these parameters. We give preliminary experimental confirmation
of our theory for a version of the random walk problem.
More information about the Connectionists
mailing list