New TR on Kernel-Based Reinforcement Learning
Dirk Ormoneit
ormoneit at stat.Stanford.EDU
Tue May 4 13:45:46 EDT 1999
The following technical report is now available on-line at
http://www-stat.stanford.edu/~ormoneit/tr-1999-8.ps
Best,
Dirk
------------------------------------------------------------------
KERNEL-BASED REINFORCEMENT LEARNING
by
Dirk Ormoneit and Saunak Sen
Kernel-based methods have recently attracted increased attention in
the machine learning literature as reliable tools to attack
regression and classification tasks. In this work, we consider a
kernel-based approach to reinforcement learning that will be shown to
produce a consistent estimate of the true value function in a
continuous Markov Decision Process. Typically, consistency cannot be
obtained using parametric value function estimates such as neural networks.
As further contributions, we derive the asymptotic distribution of
the kernel-based estimate and establish optimal convergence rates.
The asymptotic distribution is then used to derive a formula for the
asymptotic bias inherent in the kernel-based approximation.
In spite of the fact that reinforcement learning is generally biased
due to the involved maximum operator, this is the first theoretical
result in this spirit to our knowledge. The suggested bias formulas
may serve as the basis for bias correction techniques that can be
used in practice to improve the estimate of the value function.
--------------------------------------------
Dirk Ormoneit
Department of Statistics, Room 206
Stanford University
Stanford, CA 94305-4065
ph.: (650) 725-6148
fax: (650) 725-8977
ormoneit at stat.stanford.edu
http://www-stat.stanford.edu/~ormoneit/
More information about the Connectionists
mailing list