Paper on TD convergence available

Wed Sep 1 10:41:36 EDT 1999

The following paper is now available at 

	http://www.research.att.com/~mkearns/papers/tdlambda.ps.Z

``Bias-Variance'' Error Bounds for Temporal Difference Updates

Michael Kearns
Satinder Singh

AT&T Labs

We give the first rigorous upper bounds on the error of temporal difference ($\td$)
algorithms for policy evaluation as a function of the amount of experience.
These upper bounds prove exponentially fast convergence, with both the rate of
convergence and the asymptote strongly dependent on the length of the
backups $k$ or the parameter $\lambda$.
Our bounds give formal verification to
the long-standing intuition that $\td$ methods
are subject to a ``bias-variance'' trade-off, and they lead to
schedules for $k$ and $\lambda$ that are predicted to be better than any
fixed values for these parameters. We give preliminary experimental confirmation
of our theory for a version of the random walk problem.